home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Computer Club Elmshorn Atari PD
/
CCE_PD.iso
/
mac
/
1100
/
CCE_1153.ZIP
/
CCE_1153.PD
/
BSTAT248.LZH
/
bstat.248
/
stat3.doc
< prev
next >
Wrap
Text File
|
1995-03-04
|
12KB
|
264 lines
STATS 3 MENU
REGRESSION
For the tests that follow, all except LOGIT regression have
similar input and output structures. You will be asked for the
variables that are the independent variables and for the one
dependent variable. You will then be asked for the variable
(column) into which the calculated values should be placed. The
program does not place the residuals in variable (column) a, as
this would restrict the number of variables which could actually
be used in the regression. To get the residuals, simply subtract
the calculated data from the actual in the data editor. The
differences lie in additional parts of the regressions.
-Multiple regression is a traditional regression.
-Ridge regression will require the entry of a ridge factor, which
should be small and between 0 and 1 (most often below .2). -
-Stepwise regression is like multiple regression, except that you
specify all independent variables to be considered. The program
decides on which of these to actually use in the regression. -
-Cochran refers to a regression done using the Cochran-Orcutt
procedure. A "Cochran" factor of between 0 and 1 must be used.
This type of regression actually uses a part of the previous point
in the calculation. If the Cochran factor is 1, then the regression
is actually calculated upon the first differences of the
variables.
-Huber regression is used to reduce the weight given to outliers
in the data. You will need to specify two additional pieces of
data. The first is the variable into which the program places the
weights, and the second is the value of the residual at which the
weights should start to be changed. This procedure can only be
used after first doing a traditional regression.
-Weighted regression requires you to specify a weight variable
before execution.
-Chow regression is a simple modification of multiple regression.
It is used to see if the regression parameters are constant over
the scope of the data variables. You will have to specify the
number of points to keep in the first sample.
-LOGIT regression is used when the dependent variable is to be
constrained to a value above 0 but below 1. LOGIT setup converts
unsummarized data to the form required by the regression program.
(Save original data first!)
-PROBIT regression is similar to LOGIT regression. The difference
is the type of curve that is fit to the data. The logit fits a
logistic curve to the data while the probit fits a normal
distribution to the data. Except at the extremes (close to zero or
1) the difference between the results is very slight. PROBIT setup
converts unsummarized data to the form required by the regression
program. Traditionally, in the probit transform. 5 was added to
the normal deviate to avoid negative numbers. I have dispensed
with that addition to simplify the result. I think that in the
1990s we all are comfortable with negatives. As a result the
constant from B/STAT will be 5 lower than from traditional
packages.
-Non Linear regression refers to a regression where the form is
not linear in the parameters. In such a case the usual mathematical
procedures do not work. In this case you will be asked for the
dependent variables, a variable containing standard errors of the
measured points and a variable to place the results in. You will
not be asked for the independent variables. Instead you will be
asked to enter the equation. This equation is of the form Y=f(X)
except that you will use the column letters ("a" "b" etc) for the
independent variables. Each parameter that you wish to estimate
will have the form "PARM1" "PARM2" etc.
If we wanted to estimate "a" and "b" in the following formula
Y=a(1-EXP(-bX))
we would enter
PARM1*(1-EXP(-1*PARM2*a))
if the X variable was in column "a" of the spreadsheet.
-Principle Components is not actually a regression method at all.
It is a process used to reduce the number of variables needed to
explain the variation in the data. The resultant variables are
orthogonal; that is the correlation between any two variables is
0. Regression can often then be carried out against these pseudo-
variables. The process is destructive, in that it wipes out the
existing variables. Each new one is a linear combination of the
others.
-Correlation matrix shows the correlation between a group of
variables, rather than doing a full regression. This is often done
to look at the effects of multi-collinearity on the data.
TIME SERIES
These are methods of smoothing or projecting data. They are often
used in combination with other procedures.
-Moving average requires you to choose the variable and the period
of the moving average. As well, you must select a variable into
which the averaged variable will be placed.
-Geometric moving average requires the same input as linear moving
average.
-Fourier smoothing requires a variable to smooth and a variable to
place the result. It also asks for the number of terms to be kept
in the intermediate calculations. This value should be less than
50, usually less than 15. There must be no missing data for this
procedure to work. Note that this can be a slow process.
-Linear smoothing requires a variable to smoth and a variable to
place the result. A linear regression is made assuming that the
independent variable is a simple counter from 1 to the number of
rows used. The equation is
Y=a+b.t
-Polynomial smoothing fits a power series to the data. In
addition to the variable to smooth and the result variable you
must input the degree of the polynomial. A power of 1 is a linear
regression. A power of 2 fits the curve
Y=a+b.t+c.t.t
A power of 3 fits
Y=a+b.t+c.t.t+d.t.t.t
etc
-Exponential Form fits an equation such that
Y=EXP(a+b.t)
This is called exponential form to distinguish from exponential
smoothing which is a totally different process.
-S-Shape smoothing fits the following curve
Y=EXP(a+b/t)
Such a curve will rise and then approach EXP(a) if "b" is
negative. If "b" is positive then the curve will drop to approach
EXP(a)
-Brown 1-way exponential smoothing is simple exponential smoothing.
You will be asked to specify the variable to smooth, and a
variable in which to store the result. In addition, you will need
a smoothing constant (0 to 1) and a starting value. If you do not
specify the starting value, the program will generate one. This
process is not designed for data with a distinct trend line. If
there is a distinct linear trend, then 2-way exponential smoothing
should be used.
-Brown's 2-way exponential smoothing uses linear regression to
estimate a starting value and trend. You must estimate the
smoothing coefficient and variable to smooth, and variable for
result.
-Holt's 2-way exponential smoothing is similar to Brown's, except
that a separate smoothing coefficient is used for the trend
factor. Also you may enter initial values for the level and
trend.
-Multiplicative exponential smoothing is almost identical to
Holt's. The difference is that the trend factor is taken as a
proportionat e increase in value rather than a constant to add.
Thus .02 does not mean that the trend is initially an increment
of .02 but rather a percentage increase of 2%.
-Winter's exponential smoothing is used if there is a seasonal
aspect to the data (like retail sales which have a December peak).
You will have to enter 4 quantities. The first is the smoothing
coefficient for level. The second is for trend. The third is for
seasonality. The fourth value is the period of seasonality. Note
that this method should not be used with data fluctuating above
and below zero. With data that go below zero, add a constant to
the data to eliminate negative values. Then, after smoothing,
subtract the constant.
Interpolation
B/STAT uses 4 forms of estimating unavailable data.
-Simple linear interpolation requires that you simply select the
variable.
-Geometric interpolation. Basically the same as linear
interpolation except that the assumption is that the points are
connected by a multiplicitive relationship rather than additive.
-Lagrangian interpolation requires two variables: an "X" variable
and a "Y" variable. There can be no missing "X" variables. This
can be slow with a large data set, since each point is used in
estimating missing data.
-Cubic splines assumes that the data set in the selected variable
consists of evenly-spaced observations.
EXTRACT
These selections allow you to reduce the size of the data set. The
first option sums the data. For example, if you want to get yearly
totals from a data set of monthly data, you can extract summed data
and reduce the data by a factor of 12. Each element would then be
a yearly total. In the non-summed case, only every 12th value would
be left. No summing would be done. This is useful if you want to
look at subsets in isolation.
MISCELLANEOUS
This menu has three procedures, in addition to the usual help
selection.
-Crosstabs is used to summarize data which contained in two or
three variables. It produces a count for the combination of values
in the chosen variables. For example, you may have data on the
height and weight of a group of army recruits. You could use
crosstabs to find out the number in each height and weight
classification, where these could be height in 2-inch increments
and weight in 5-pound increments. It is most commonly used in
market research for crosses, such as between age 30 and 34 and
earning between 20,000 and 30,000 dollars per year.
You first select the variables to use in the crosstab. If you
select two, then a 2-way crosstab is done. If three, then a 3-way
crosstab is done. Next, you select the break points for the
classes in each variable. There may be up to 14 breakpoints,
giving a maximum of 15 classes for each variable. You need only
type in as many breakpoints as there are in the a specific
variable, and leave the rest blank. The number of break points can
be different for each variable. Note that the lower class includes
the break point value. Thus, a breakpoint of 200 pounds would put
200-pound people in the lower class and 200.01 pound people in the
higher class. The program will print out the results. If you want,
you may replace the data in memory with the summarized totals.
This can be quite useful if you then want to perform a Chi square
test, type 2, on the result to see if there are any significant
relationships.
One factor crosstabs are available. If you choose only one variable
then the program will generate a new data matrix composed of 2
variables only. There will be one entry for each unique value in
the chosen variable. The second variable will be the number of
occurrences of that value in the original variable. This is a
destructive process which erases all original data.
-Difference is a rather simple process. The difference of a
variable is simply the amount of its change from one period to the
next. Sometimes some procedures will work better on the change in
a variable rather than the variable itself. This is especially
true in Box Jenkins analysis. You merely supply the variable to
difference and the variable into which to place the result.
-Box Cox Transforms are used to transform a variable so that the
values are normally distributed. The Box Cox procedure uses a
variable called "lambda". You must provide the minimum lambda to
test as well as the maximum. You also must specify the number of
steps to use in going from the minimum to the maximum. The
program will select the best value of lambda from the ones that
it tests. The variable to test must have all values greater than
zero. You also specify a variable into which teh result will be
placed.