home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Fred Fish Collection 1.5
/
ffcollection-1-5-1992-11.iso
/
ff_disks
/
300-399
/
ff386.lzh
/
Statpack
/
statpack.lzh
/
Examples
/
Australia
/
Australia.doc
< prev
next >
Wrap
Text File
|
1988-08-25
|
4KB
|
59 lines
The data in the file Australia.data are taken from Aitkin (1978) "The
analysis of unbalanced cross-classified designs." Jr. Roy. Stat. Soc. A141:
206. The columns in the table are: days Absent, Culture (aboriginal/white),
Sex (female/male), Year (primary/first form/second form/third form), and
Learner (slow/average). The last four are coded nominal variables.
We first read the data into Statpack, defining their names. None have
missing value codes. Next, we create a dump with the same name for future
use.
The first step in the statistical analysis is to inspect the simple
frequency tables for the nominal variables using Tabulate1. For Absence,
we use Desc Stat and see that this variable is highly skewed.
For the four nominal variables, we next study the inter-relations
through two- (and, if necessary, three-) way contingency tables. After
constructing each table with Tabulate2, we adjust the log-linear model with
Log Lin2. The Chi-square tests for independence indicate that only the
relationships, Sex-Year and Year-Learner are significant at the 5% level,
especially the latter. For some reason, girls are very much over-represented
in primary school and boys in first form. Slow learners seem to be much less
frequent in third form.
We may now construct a cross-tabulation for Sex, Year and Learner, using
Tablulate3, followed by Log Lin3. We see that the three-way interaction is
significant, as well as the two two-way interactions with Year (as before).
We examine the interaction model. In second form, there appear to be more
male slow learners than elsewhere.
To study the relationship of days absent to the other variables, we
shall use analysis of variance. The one-way tables show that absence depends
on Culture and Year, but not on Sex and Learner. (The sums of squares differ
slightly from those in the original article, but are identical to those
obtained with GLIM.) There are more days absent among aboriginals than
whites, and more in second and third form than the other years. Two-way ANOVA
permits us to study the two independent variables at the same time. The
interaction between them is very significant. Display of the interaction
model shows that aboriginals are especially absent in first and second forms.
Since our preliminary analysis showed the days absent to be very skewed,
we may now test to see what distribution might be fitted. We first regroup
our data into 10 categories with Tabulate1 (the second last category has not
observations, so does not appear in the table). We then choose Dist. The tail
frequencies have not been regrouped, but, since they are small, we ask that
they now be collected together; we are left with 7 categories. We select
continuous distributions and try each in turn. Aitkin suggest a log normal,
but this does not adjust very well. The simplest acceptable distribution is
the exponential. We plot the histogram and theoretical distribution and save
them to a file. Unfortunately, this cannot be fitted with Statpack (try
GLIM). We choose the power-transformed normal.
We now select Var Mod, and in the new menu, Transform. We give the power
a value of 0.08 to be applied to days lost. If we had chosen to apply a log
transform as suggested by Aitkin, we would have first chosen Constant and
added one to the values of days lost before applying the logarithm.
Returning to the Main Menu, we redo the analyses of variance on the new
dependent variable. Absence no longer depends on Year, but only on Culture.
In the two-way analysis, the interaction is no longer significant either.