The Fred Fish Collection 1.5

home *** CD-ROM | disk | FTP | other *** search

/ The Fred Fish Collection 1.5 / ffcollection-1-5-1992-11.iso / ff_disks / 300-399 / ff386.lzh / Statpack / statpack.lzh / Examples / Australia / Australia.doc < prev next >

Wrap

Text File | 1988-08-25 | 4KB | 59 lines

The data in the file Australia.data are taken from Aitkin (1978) "The analysis of unbalanced cross-classified designs." Jr. Roy. Stat. Soc. A141: 206. The columns in the table are: days Absent, Culture (aboriginal/white), Sex (female/male), Year (primary/first form/second form/third form), and Learner (slow/average). The last four are coded nominal variables. We first read the data into Statpack, defining their names. None have missing value codes. Next, we create a dump with the same name for future use. The first step in the statistical analysis is to inspect the simple frequency tables for the nominal variables using Tabulate1. For Absence, we use Desc Stat and see that this variable is highly skewed. For the four nominal variables, we next study the inter-relations through two- (and, if necessary, three-) way contingency tables. After constructing each table with Tabulate2, we adjust the log-linear model with Log Lin2. The Chi-square tests for independence indicate that only the relationships, Sex-Year and Year-Learner are significant at the 5% level, especially the latter. For some reason, girls are very much over-represented in primary school and boys in first form. Slow learners seem to be much less frequent in third form. We may now construct a cross-tabulation for Sex, Year and Learner, using Tablulate3, followed by Log Lin3. We see that the three-way interaction is significant, as well as the two two-way interactions with Year (as before). We examine the interaction model. In second form, there appear to be more male slow learners than elsewhere. To study the relationship of days absent to the other variables, we shall use analysis of variance. The one-way tables show that absence depends on Culture and Year, but not on Sex and Learner. (The sums of squares differ slightly from those in the original article, but are identical to those obtained with GLIM.) There are more days absent among aboriginals than whites, and more in second and third form than the other years. Two-way ANOVA permits us to study the two independent variables at the same time. The interaction between them is very significant. Display of the interaction model shows that aboriginals are especially absent in first and second forms. Since our preliminary analysis showed the days absent to be very skewed, we may now test to see what distribution might be fitted. We first regroup our data into 10 categories with Tabulate1 (the second last category has not observations, so does not appear in the table). We then choose Dist. The tail frequencies have not been regrouped, but, since they are small, we ask that they now be collected together; we are left with 7 categories. We select continuous distributions and try each in turn. Aitkin suggest a log normal, but this does not adjust very well. The simplest acceptable distribution is the exponential. We plot the histogram and theoretical distribution and save them to a file. Unfortunately, this cannot be fitted with Statpack (try GLIM). We choose the power-transformed normal. We now select Var Mod, and in the new menu, Transform. We give the power a value of 0.08 to be applied to days lost. If we had chosen to apply a log transform as suggested by Aitkin, we would have first chosen Constant and added one to the values of days lost before applying the logarithm. Returning to the Main Menu, we redo the analyses of variance on the new dependent variable. Absence no longer depends on Year, but only on Culture. In the two-way analysis, the interaction is no longer significant either.