home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
DP Tool Club 25
/
CD_ASCQ_25_1095.iso
/
dos
/
educ
/
kstat41
/
xxks2.lzh
/
KS4PROG.HLP
< prev
next >
Wrap
Text File
|
1995-08-16
|
56KB
|
1,167 lines
KWIKSTAT 4 Professional Program Help Screens
@1,Data Topics:Data entry, import, reports and sorts
@6,Descriptive Statistics
@7,Graphs - Descriptive and Comparative
@12,Spin Plot
@13,t-test and ANOVA, paired or independent groups
@15,Non-Parametric Comparisons
@16,Simple Linear & Multiple Regression and Correlation
@18,Crosstabulations, Frequencies, Chi-Square
@19,Life Table and Survival Analysis
@20,Data Generation and Simulations
@22,Advanced ANOVA Designs
@24,Advanced Regression
@34,Time Series Analysis
@36,Quality Control Charts
@41,Pareto Charts
@42,Multiple Comparisons
@43,Using the FILE menu
@44,Using the EDIT menu
@45,Using the HELP menu (Includes Setup)
@46,Using the KWIKSTAT Viewer
@47,General Graph Options
@50,Using REPLACE & using functions in REPLACE and SUBSET
##1 ##FILE
DATA TOPICS:DATA ENTRY, IMPORTS, REPORTS AND SORTS
==================================================
Data may be entered from the keyboard, or from an ASCII text file. Data
already stored in a dBASE III or IV file may also be used. Data may also be
imported from comma delimited and 1-2-3 files.
ENTERING DATA FROM THE KEYBOARD
1. CREATE THE STRUCTURE OF YOUR DATABASE by selecting the "NEW DATABASE"
option in the FILE menu.
2. ENTER/APPEND DATA by choosing the "APPEND RECORDS" option on the EDIT menu.
3. EDIT DATA by choosing the "EDIT RECORDS" option on the FILE menu.
4. CREATE NEW VARIABLES and REPLACE CONTENTS OF CURRENT FIELDS by choosing the
F9/Field option while editing data.
##2
ENTERING DATA FROM AN ASCII TEXT FILE
Create a database structure using CREATE. Structure should match the columns
of data in the data file. For example, your data is in a file named
"MYDATA.TXT". A database structure could be created using the following
format:
Field Type Width Dec
NAME C 10
AGE N 2 0
BDATE D 8
^ ^ ^ ^
│ │ │ └─────────────Number of decimals in numeric data
│ │ └───────────────────Columns where data is found
│ └─────────────────────────Data type
└────────────────────────────────Variable (field) name
This means NAME is in columns 1-10, AGE in 11-12 and BDATE in 13-21.
NOTICE: The format MUST be inclusive of all columns. DO NOT SKIP COLUMNS when
specifying where data is located.
##3
ENTERING DATA FROM A DBASE III or IV FILE
-----------------------------------------
KWIKSTAT reads data directly from dBASE III and IV files. In each module, you
may specify which dBASE file to use. The module will display all ".DBF" files
in the default path by listing them in a pick box.
To choose the database to use, press the up or down arrow keys to highlight
the name of a database, then press Enter.
You may also call files from other directories by pressing the F2 key when
the database list appears. Specify another path for the program to search,
enter \DB3. A new pick list appears listing .DBF files in the specified path.
##4
IMPORTING DATA
--------------
FROM LOTUS 1-2-3: Import WKS, WK1 files by choosing the Utility option on the
FILE menu. For WK* import, you need to know the range in the spreadsheet,
such as A1..D15. KWIKSTAT imports 1-2-3 files for versions 1 and 2 of 1-2-3.
Also, most versions of 1-2-3 contain options for exporting data in a DBF file
format, so you can save the data from the spreadsheet as a DBF file, then use
it directly in KWIKSTAT. Other programs such as EXCEL also will save data as a
DBF file. (Look in your program's index (i.e. the EXCEL manual) under DBF or
dBASE.)
IMPORT COMMA DELIMITED ASCII FILES: Import files where data is in the form:
23,34,"label",11
by choosing the Utilities option from the FILE menu.
NOTE: Once you have imported, you can change field name, width, etc. by using
the Modify option on the FILE menu.
##5 ##OUTPUT
REPORTS, DATA OUTPUT, SORTING DATA
----------------------------------
The Utility option on the FILE menu allows you to:
o Output a report, listing the data in the dataset (or a selected subset of
the database). You may view the report before printing it.
o Output the data into a standard ASCII TEXT (SDF) file. This is useful for
transferring the data to other programs.
o Sort data either ascending or descending by specifying field(s) as the sort
keys. For example, use AGE to sort by AGE or STATE+CITY to sort city within
state.
##6 ##STAT
DESCRIPTIVE STATISTICS
======================
DETAILED STATISTICS - gives mean, standard deviation, etc., plus percentiles,
confidence interval and a box plot on one variable at a time
SUMMARY STATISTICS - gives mean, st. dev. etc. on several variables at a
time, and allows listing of statistics by a grouping factor
P-VALUE - calculates p-value for Z, t, Chi-Square and F statistics
DETAILED STATISTICS FROM KEYBOARD ENTRY - Enter data from the keyboard as
numbers or in grouped numbers
STEM and LEAF Display - summarizes data using a table/graph
##7, ##GRAPHS
GRAPHS - DESCRIPTIVE AND COMPARATIVE
====================================
The GRAPH module contains 6 graph types. These are:
o BAR/LINE/AREA Graph - shows distributions of frequencies
o PIE CHART - shows distributions of data by percent of whole
o TIME-SERIES - examines ordered data across time
o XY (SCATTERPLOT) - examines the relationship between two variables
o HISTOGRAM - examines the distribution of a continuous variable
o BY-GROUP - compares two or more groups graphically using means, medians, box
and whiskers or dot plots
##8
CREATING A BAR CHART
--------------------
Create the database and enter the data (see Creating a Database). Your
database should look something like this:
- It should contain a Label field & a Value field (the Value field contains
the numbers to use for the plot.) For example: the MAGNET is the LABEL field
and NAILS is the VALUE field.
----these are the fields-----
RECORD MAGNET NAILS
------ ------
1 SMALL 38 ─┐
2 MEDIUM 46 │──- this is the data to plot
3 LARGE 59 ─┘
│
└─────────────────── these are the labels for the plot
NOTE: This will create a BAR CHART with 3 bars labeled SMALL, MEDIUM & LARGE.
You could also use this same data to create a pie chart. NOTE: You can have
more than one value field, & create a side-by-side bar chart or a stacked
bar chart. This data can also be used for a line chart or an area chart.
##9
CREATING A PIE CHART
--------------------
Create the database and enter the data (see Creating a Database). Your
database should look something like this:
- It should contain a Label field
- It should contain a Value field (contains the numbers to use for the plot)
For example: the COLOR is the LABEL field and COUNT is the VALUE
field. This data refers to hair color for 50 people in your class.
----these are the fields-----
RECORD COLOR COUNT
------ ------
1 BLONDE 9 ─┐
2 BROWN 14 │──- this is the data to plot
3 BLACK 22 │
4 RED 5 ─┘
│
└─────────────────── these are the labels for the plot
This will create a PIE CHART with 4 slices. You could also use the same
database to create a bar chart.
##10
CREATING A SCATTERPLOT/REGRESSION LINE
--------------------------------------
Create the database and enter the data (see Creating a Database). Your
database should look something like this:
- It should contain a GROUP field (if you have more than one group)
- It should contain two or more value fields
For example: Sex is the GROUP field and Height and Weight are VALUE fields.
----these are the fields-----
RECORD SEX HEIGHT WEIGHT
------ ------ ------
1 M 70 202 ─┐
2 M 65 145 │──- this is the data to plot
3 M 72 188 ─┘
: F 60 103
22 F 62 122
23 F 59 112
└─────────────────── this is the group field
This will create a SCATTERPLOT and REGRESSION LINE PLOT.
##11
CREATING A TIME SERIES/LINE PLOT
--------------------------------
Create the database and enter the data (see Creating a Database). Your
database should look something like this:
Data should contain one or more value fields and an optional label field.
For example: Sales1 is a VALUE field for team1 and Sales2 for team2.
Data is in time order. In this example, sales for a month are:
----these are the fields-----
RECORD DAY SALES1 SALES2
---- ------ ------
1 1 4 3 ─┐
2 2 5 4 │──- this is the data to plot
3 3 6 4 ─┘
: 4 5
31 23 6 5
───────────────────────
└─────────────────── these are the group fields
This will create a TIME SERIES PLOT with two lines.
##12, ##SPIN
SPIN PLOT
=========
A spin plot is a three dimensional (XYZ) scatterplot. You must have a database
containing at least three numeric variables. Usually, these variables should be
continuous. Optionally, you can specify a grouping variable.
The plot will display the data by group using colors or point patterns. Choose
how the data will be displayed from the <Options> menu. You can choose to
turn the axis on and off, display a box, display rays, etc.
Use the menu at the right of the screen to move the plot in any of three
directions. To cause the plot to move continuously, hold the CTRL key down and
press a directional key (i.e., CTRL-rightarrow).
##13 ##TTEST
T-TESTS AND ANOVAs
==================
FOR INDEPENDENT GROUPS OR SINGLE GROUP
--------------------------------------
TWO GROUPS:Student's t-test, data expected to have a grouping variable,
also provides a test for the equality of variance, and two versions of the
t-test according to whether the variances can be considered equal.
3 TO 10 GROUPS: One way ANOVA, Multiple comparisons performed, data must have
a grouping variable. Comparative plots displayed.
T-TEST AND ANOVA from summary data - comparative plot (no box plots) can be
displayed.
Single sample t-test - you choose the hypothesized value to test.
##14 ##REPEAT
T-TESTS FOR PAIRED OR REPEATED MEASURES
---------------------------------------
TWO TIME PERIODS OR TWO PAIRED OBSERVATIONS:Students t-test for paired
observations. Data is expected to be paired within each record in the
database. For example: two fields in the database could be:
Before After
200 175
130 123
etc.
3 TO 10 REPEATED MEASURES: An extension of the t-test, with 3 or more
repeated measures. Repeated Measures ANOVA performed, with Newman-Keuls
multiple comparisons. Comparative box plots displayed.
##15 ##NPAR
NON-PARAMETRIC COMPARISONS
==========================
Note:Use non-parametric procedures when the data cannot be assumed to be
normally distributed.
FOR INDEPENDENT GROUPS OR SINGLE GROUP
--------------------------------------
TWO GROUPS: Mann-Whitney U, comparison based on ranks of the data.
3 TO 10 GROUPS: Kruskal-Wallis One-way ANOVA based on ranks; Multiple
comparisons performed at the 0.05 significance level.
FOR PAIRED OR REPEATED MEASURES
-------------------------------
TWO TIME PERIODS OR TWO PAIRED OBSERVATIONS: Friedman's Test.
3 TO 10 REPEATED MEASURES: Friedman's ANOVA with Multiple comparisons.
##16 ##REG
LINEAR REGRESSION
=================
SIMPLE LINEAR REGRESSION - relating two variables. This procedure provides
an equation representing a straight line fitted through the data, and a test
of the significance of the linear relationship. You can also plot the data
to verify a linear trend and to examine residuals.
MULTIPLE REGRESSION - Allows you to relate up to 10 independent variables to
a dependent variable. The significance of each variable is determined, and
the coefficients to a prediction are calculated.
You can use the information on the significance of each variable to determine
what variables to leave in the equation and which to remove in order to find
the best equation possible.
##17
CORRELATION PROCEDURES
----------------------
CORRECTION calculates the Pearson and Spearman correlation coefficient for a
pair of variables. The significance of the coefficient is also given.
Usually, Pearson's is calculated when the data are normal, and Spearman
(which is based on ranks) is used for non-normal data.
MATRIX OF CORRELATIONS - allows you to calculate combinations of correlations
(Pearson) on up to ten variables at a time.
DISPLAY A MATRIX OF SCATTERGRAMS - allows you to visually examine the
relationship on pairs of data for up to 10 combinations at a time.
##18,Crosstabulations, Frequencies, Chi-Square ##CROSS
CROSSTABULATIONS, FREQUENCIES & CHI-SQUARE
==========================================
The Crosstabulations, Frequencies, Chi-Square module performs analyses
on categorical data, that is, data observed in categories, rather than
measurement data. Generally, categorical data are entered into a database
by using one record for each person or entity on which the observation is
made and one field for each characteristic which is divided into
categories. For example, to categorize ten people by sex, hair
color and eye color, you would need ten records (one per person)
and three fields (e.g., SEX, HAIR, EYE).
Some of the procedures in this module give you the choice of simply
entering totals for each category rather than creating a database
and entering the results of each observation. This can save time if
totals are known and only totals are needed to perform a test or
calculation or to produce a graph.
KWIKSTAT "counts" the occurrence of each data value for a single variable
or field and displays that information in a table or in a graph.
##19,Life Table and Survival Analysis ##LIFE
LIFE TABLE AND SURVIVAL ANALYSIS
================================
As the name indicates, this module performs life tables (either actuarial or
Kaplan-Meier) and survival comparison procedures. The data must be in the
following form:
1) a TIME variable which contains a time (e.g., minutes, days,
years, etc.) in which the subject or component has been observed to
be alive (not failed).
2) a CENSOR variable which must take on the values 0 or 1, where
1 means the subject has died (failed), and a 0 means the subject
was still alive (not failed) at the last available time period.
3) optionally, a GROUPING variable which may have up to ten values
(numeric or character), i.e., the data may be in groups.
A plot is given for the cumulative proportion surviving in the
group(s) against time. If more than one group is entered, a
Mantel-Haenszel test is performed to test the hypothesis of equal
survival patterns for the groups.
##20 #SIMU
DATA GENERATION AND SIMULATIONS
===============================
A) GENERATE DATA SETS
This option allows you to create KWIKSTAT (dBASE) data sets from a Normal,
Uniform or Exponential distribution. You will be asked to specify the number of
variables to generate. Then for each variable, you must specify if it is to be
from a Normal, Uniform or Exponential distribution.
B) 95% CONFIDENCE INTERVAL SIMULATION
The 95% Confidence Interval simulation shows visually the meaning of a 95%
confidence interval. In this simulation, 100 samples of size 30 are drawn from
a normally distributed distribution and a 95% C.I. is calculated for each
sample. The true mean of the population is plotted as a horizontal line on the
screen. Each C.I. is plotted vertically on the screen so you can visually see
the range of the C.I. and whether or not it covers the true population mean.
All 100 C.I.'s are plotted and a summary of how many covered the population
mean is displayed on the screen.
(continued)
##21
(Simulations continued...)
C) FLIP A COIN DEMONSTRATION
When you flip a fair coin, you would expect the percentage of heads to approach
50% over a long period of time. This simulation automates 100 coin tosses and
graphs the results. You should repeat this simulation a number of times to see
how the graph varies for different series of flips.
D) DEMONSTRATE DISTRIBUTION OF SAMPLE MEAN (CENTRAL LIMIT THEOREM)
In this demonstration, a population of 500 points is generated from either a
Normal, Uniform or Exponential distribution. Then, 100 samples of sizes 1, 2,
3, 5, 10 and 30 are taken from the population. A histogram of the original
"population" is displayed on the left side of the computer screen and a
histogram of the 100 sample means is displayed on the right side of the screen.
As each pair of histograms is displayed, you can see how the right side
histograms approach a bell-shape as the sample size increases. Also, as the
sample size increases, the spread of the histogram gets smaller, the standard
deviation decreases by n times. The demonstration will automatically cycle
through the 5 sizes of samples.
E) LIST VALUES FROM A DATABASE TO THE SCREEN
This option allows you to display the contents of a database to the screen.
##22,Advanced ANOVA Designs ##ADVAOV
ADVANCED ANOVA DESIGNS
======================
Two-Way ANOVA (balanced or unbalanced)- An analysis of variance is a method of
comparing means between several experimental groups. In a two-way analysis of
variance, the experimental design consists of two grouping factors and one or
more observations on each combination of the grouping factors.
For example, suppose you have designed an experiment to examine the
effectiveness of several display strategies on sales. You have selected three
display widths, and two heights, giving you 6 display combinations. In order to
make comparisons of sales for each combination of height and width, you want to
place one of the 6 display combinations at each of several stores. Then, after
a period of time, you will examine the sales from each combination to see if
you can discover which combination produces the most sales. (See the database
on disk named SALES.DBF.)
NOTE: If the data are balanced (no missing values) both the balanced and
unbalanced analysis will yield the same results. However, the unbalanced
procedure usually takes longer to compute. Therefore, use the unbalanced
procedure only when you have unequal sample sizes per cell in the design.
(continues...)
##23
(ADVANCED ANOVA DESIGNS continued...)
Two-Way Repeated Measures Analysis - In a two-way analysis of variance, it is
common to examine one "subject" at several points in time, or under several
conditions. This differs from the replicates on the two-way analysis example
where the "replicates" are unrelated. In a repeated measures example the
replicates are related.
Data for an example repeated measures two-way analysis is included in the
database REPEAT2.DBF on disk. In this example, there are two methods of
calibrating DIALS (factor A), and the levels of B are four SHAPES of the dials.
Six subjects were randomly assigned to perform the calibrating on a particular
dial (A) for all four shapes of dials. That is, each of the six subjects were
observed four times, once for each combination of the DIAL/SHAPE settings. The
scores observed are accuracy.
##24,Advanced Regression ##ADVREG
ADVANCED REGRESSION
===================
Regression analysis is used to model relationships between a dependent
(response) variable and one or more independent (predictor) variables. The
KWIKSTAT Advanced Regression Module includes three regression options:
o Polynomial Regression
o All Possible Regressions
o Stepwise Regression
o Customized (Least Squares) Regression Calculations
Polynomial Regression Analysis is useful for determining whether higher order
terms (squared, cubed,etc.) of a single predictor (independent) variable are
helpful in modeling the relationship with the response (dependent) variable.
All Possible and Stepwise Regression options are methods for the final step in
multiple linear regression, that of selecting a set of predictor variables for
appropriately modeling the relationship between the predictors and the
response. Customized Regression allows you to define the contents of the
regression matrices that are used in calculating the regression equation.
(continues...)
##25 ##POLY
(Regression continued...)
Polynomial Regression
---------------------
Polynomial regression is considered in a situation in which the relationship
between predictor and response variables is curvilinear.
The data in GAME.DBF are the ages of 29 players and their scores on a new
video game (generated data). If you plot that data on an XY plot, it appears
that the relationship between AGE and SCORE is not clearly linear, but that a
quadratic term may be helpful in describing the relationship. Such a
polynomial model can be recognized as a form of a multiple linear regression
model with two predictor variables, X and X-Squared.
In fitting a polynomial regression model, all lower order terms must be
included. That is, the first-order term is used, and higher order terms are
used only if the first-order term is not sufficient. A cubic term is used
only if both the linear (first-order) and quadratic (second-order) terms are
included. When using Polynomial Regression in KWIKSTAT, you are asked to
specify the order of the polynomial you wish to fit.
(continues...)
##26
(Polynomial Regression continued...)
As with any multiple regression analysis, care must be taken to avoid
collinearities between the predictors. That is, if the predictors are highly
correlated, the coefficient estimates may contain considerable error.
Centering the data is an option, but may not always sufficiently reduce
collinearities, in which case the data should be standardized (divide the
centered values by the standard deviation of the predictor variable values).
There are various approaches for determining the order of the model. One
method is a "forward selection" procedure in which the first-order (linear)
term is fit and then higher order terms are added sequentially until the
F-test for a non-zero coefficient is not significant for the highest order
term. Another method is a "backward elimination" procedure in which an
appropriately high-order polynomial model is fit and terms are deleted one at
a time from high to low order until the highest order term of the remaining
terms results in a significant F-test. These two methods may not result in
the same model.
(continues...)
##27
(Polynomial Regression continued...)
KWIKSTAT fits a model of the order you select and reports the coefficients of
each term, including an intercept term, up to that order. The results of the
tests of significance of these coefficients are also reported. A small
p-value indicates that the corresponding coefficient is significantly
different from zero. Residual analysis is also useful for investigating the
appropriateness of the model selected. KWIKSTAT also reports the Analysis of
Variance for the entire regression fit, as well as R-Square and adjusted
R-Square, as it does in the regular Linear Regression module.
In general, in regression analysis simpler models are preferred. It may be
possible to transform the predictor in some way so that higher order terms
are not necessary. Terms higher than second or third order are not usually
used unless there is some reason inherent in the data. It is always possible
to fit a high enough order model, but such a model is difficult to interpret
and not generally recommended.
(continues...)
##28
(Polynomial Regression continued...)
As with any regression model, extrapolation is dangerous and should be
avoided. While a polynomial model may adequately model the relationship
between variables within the range of the data used in the analysis, it is
extremely risky to assume that relationship continues to exist outside the
range of the data. Refer to a standard text, such as Neter and Wasserman or
Montgomery and Peck, for more information about polynomial regression.
##29 ##ALLPOSS
All Possible Regressions
------------------------
Also known as "best subset selection", this procedure consists of considering
all possible combinations of the predictor variables. Comparison can be based
on a number of criteria, including mean squared error, Mallow's Cp, and
R-Square. The calculated MSE is an estimate of the variance of the errors in
the full model. A smaller error variance is desirable, so different models
can be compared based on MSE, with those having a smaller MSE preferred.
R-Square, the coefficient of determination, is a measure of how much of the
variability in the response is explained in the model, provided the model has
been arrived at properly. A model with larger R-Square is preferred to one
with a much smaller R-Square.
Mallow's Cp is a statistic which is a function of the error sum of squares
for the full model and that for the reduced model. Under the correct model,
Cp is approximately equal to p and otherwise is greater than p, reflecting
bias in the parameter estimates in the regression equation. Thus, it is
desirable to select a model in which the value of Cp is close to the number
of terms, including the constant term, in the model.
(continues...)
##30
All Possible Regressions (continued)
These three criteria are typically used to compare combinations, or subsets,
of the predictor variables. When KWIKSTAT reports the results of the All
Possible Regressions procedure, it reports all three of these criteria. Of
course, you should also take into account any theoretical criteria for
including or excluding variables, as well as be careful not to include
redundant variables, which may introduce collinearities. It is often helpful
to consider which variables consistently appear in the better models. The
better models can then be analyzed using the Multiple Regression option of
the Regression and Correlation Module in the regular KWIKSTAT program, and
the results of tests for significant coefficients considered in the final
decision. Residual plots of predicted values under the chosen model should
show a random scatter of points.
Clearly, comparing all possible models is generally the best method for
making a decision about a "best" model since it provides the most information
about the available choices. However, the "all possible" subsets procedure
can become quite large with just a moderate number of predictors. KWIKSTAT
has the capability to perform All Possible Regressions on a maximum of eight
predictor variables. With eight variables, there are 28-1, or 255, possible
subsets, and the procedure can take some time.
##31 ##STEPWISE
Stepwise Selection
------------------
For a large number of predictors, or if for other reasons the All Possible
Regressions variable selection procedure is not practicable, an alternative
is the Stepwise variable selection procedure. KWIKSTAT's Stepwise option can
consider up to 49 variables, and can define a model using up to 20 of those
variables.. As noted earlier, the Stepwise procedure is a combination of
"forward selection" and "backward elimination" techniques.
At the first step, the model consisting of all variables is considered, and
the variable testing "most significant", i.e., having the largest
F-statistic, becomes the first variable included in the model. In the second
step, the variable selected in the first step is forced into the model and
the other variables are then fit. A cut-off p-value is used as the selection
criteria to determine whether any more variables should be included. This
cut-off p-value selection criteria can be designated by you, or else the
default criteria used by KWIKSTAT is a p-value of 0.25 for the F-tests. Of
those variables meeting the selection criteria at step two, the one showing
the most significance, i.e., having the largest F-statistic, is added to the
model consisting of the variable selected in the first step.
##32
Stepwise Regression (continued)
The two-variable model is then "checked" and if the coefficients of both
variables are shown to be significantly different from zero (having small
p-values), the process continues. Again, the cut-off p-value can be set by
you, or else the default is 0.25. At the third step, the two already chosen
variables are forced into the model and the other variables then fit. If any
remaining variables meet the selection criteria, the "most significant" of
those is added, and the three-variable model checked. The process continues
as long as all selected variables satisfy the "checking" procedure, and as
long as at least one remaining variable meets the selection criteria and is
added to the model at each "forward" step. The operator is also given the
opportunity at each step to continue or to stop the procedure.
##33 ##CUSTOM
Customized Regression Calculations
----------------------------------
At times you may wish to perform a regression (least squares) calculation
that is different from those defined elsewhere in KWIKSTAT. The Customized
Regression option allows you to place your own information directly into the
matrices that are used to perform a regression calculation. The regression
equation (in matrix form) can be written Y = Xb + e where Y is an array of the
dependent variables, X is a matrix containing information about the
independent variables, b (beta) is the array of coefficients for the
regression equation and e (an error term.) To calculate the beta array:
1. Create a database with columns representing the Y array and X matrix.
2. In the Advanced Regression option, specify what variables in the database
contain the values for the Y array and for each column of the X matrix.
3. Choose the option to perform the calculation. The results are reported .
The Custom Regression procedures assume that you have the mathematical
background to devise the matrices needed for this kind of analysis. See the
manual for examples.
##34,Time Series Analysis ##TIME
TIME SERIES ANALYSIS
====================
Time series analysis deals with attempting to model an observed series of
datapoints to forecast future activity or to understand the driving mechanism.
There are a number of approaches to modeling. This time series program bases
its modeling techniques on the ARMA (autoregressive moving average) approach.
In this approach, the researcher must first decide if there is an
autoregressive (AR) and/or moving average (MA) component, and the order of
each. These orders will be called p and q. Use p as the order of the AR
component and q as the order of the MA component. Thus, a model will be
designated as an ARMA(p,q). For example, the model ARMA(8,0) means that the
order of the AR component is 8 and the order of the MA component is 0 (none).
The goal is to find a model which adequately describes the process without
using any extra parameters, a parsimonious model.
The purpose of the KWIKSTAT Time Series program is to help you:
A) Decide what ARMA model is appropriate for your data.
B) Estimate the parameters of the model.
C) Create a forecast.
(continues...)
##35
(TIME SERIES ANALYSIS continued...)
Model Identification - The first part of the analysis process is model
identification. One way to determine if the data are white noise is to examine
the sample autocorrelations. If they are small and uncorrelated then the
process may be white noise. If the process is white noise, then approximately
5% of the sample autocorrelations (absolute values) would be expected to be
greater than 2/sqrt(n) where n is the length of the series. KWIKSTAT provides a
test to help you decide what model is appropriate. The W-statistic (see
Woodward and Gray) technique examines the data for fit to a series of models,
and returns the three "best" guesses for a model. It does not necessarily
choose the best model, but it is helpful in choosing which models to consider.
Estimating Parameters - Once a model has been chosen, you may estimate the
values of the parameters of the model given your set of data.
Forecasting - Once you have the estimates of a model, you can use this
information to create a forecast. If the model estimates these to your
satisfaction, then it may be a good model for forecasting into the future.
KWIKSTAT allows you to forecast and plot future values of the series. An
optional 95% confidence bound may be calculated to give you a range for your
estimated forecast.
##36,Quality Control Charts ##QCC
QUALITY CONTROL CHARTS
======================
The KWIKSTAT Quality Control module allows you to perform quality
control calculations and produce several kinds of control charts:
o X-Bar Chart (Chart on Means)
o R-Chart (Chart on Ranges)
o S-Chart (Chart on Standard Deviations)
o Control Chart for Individual Measurments
o P-Chart (Chart on Proportions)
Options in displaying the control charts include:
o Plot all points or a range of points on a chart.
o Plot X-Bar and R-Chart on same screen.
o Plot Upper and Lower control limits.
o Use standard 3-sigma control limits or specify your own limits.
o Select a point on the control chart with the mouse
pointer or with a cursor pointer and display database values used
to calculate that point.
o Print chart to printer or capture to a PCX graphics file.
o Interactively select chart colors.
o Zoom in and out on portions of the plot to see more detail.
##37
Preparing the Data for a Control Chart Plot
-------------------------------------------
The data for X-Bar or R-Charts should be stored in a database in the following
format:
The data that will be used to calculate the means to be plotted come from a
sample of observations, with each sample containing a number of replicates.
For example, you might take samples of jars filled with jelly 25 times during
the day. Each time you take a sample, it consists of 3 jars. Thus, you have
25 samples, each with a size of 3 (3 replicates). The data for this chart
would be stored in a database using the following setup:
Sample Value
1 15.9
2 16.1
3 16.0
1 16.2
2 15.9
3 15.6
etc.
(continues...)
##38
(Control Chart continued...)
Each jar should contain 16 ounces of jelly. You do not want the jars too
empty or too full. Thus, you may want to see that the average amount of jelly
does not go under or over certain limits. Also, you do not want the range to
be too wide -- which may mean that the "average" jar contains 16 ounces, but
the amount in different jars may vary widely.
The database needed for this analysis would contain 2 fields, SAMPLE and VALUE.
To create this database, choose the Create a Database option from the FILE
menu. You can then choose to Create a custom database, or you could choose
the pre-defined database that contains 2 fields, where SAMPLE has a width of
1 and VALUE has a width of 5 with 2 decimal places. Once you have created the
database, enter the data, one sample per record.
You may use this data to display an X-Bar chart, and an R or S Chart.
Also, see the sample dataset named XCHART.DBF on your disk.
NOTE: A database for a control chart for individual measurements is similar
to the one described here, but there is only 1 replication per sample.
##39
Displaying P-Charts
-------------------
P-Charts plot a proportion of items observed from within a sample. For
example, you might take a sample of 25 items from a manufacturing process
each hour. Then you count the number of defects in that sample. You are
interested in plotting the proportion of defects across time to observe if an
unusually high number of defects begin to occur. The format for a KWIKSTAT
database is:
Sample SampSize Defect1 Defect2 Defect3
1 25 0 1 2
2 25 0 0 0
3 25 1 1 0
etc.
For example, you might observe 3 kinds of defects, defect 1 is a color
problem, defect 2 is a weight problem and defect 3 is a function problem.
Thus, for sample 1 the proportion of defects found is 2/25. Using this same
database, you could also create a P-Chart that only considers defects of type
3. In this case, you would choose only the SampleSize and Defect3 fields for
analysis, and the proportion observed for sample 1 would be 1/25.
(continues...)
##40
(P-chart continued...)
The minimum number of fields in the KWIKSTAT database needed for this chart
is two, a Sample Size field and a Count field. If there are more than one
Count (defects) fields, the program will add up the defect fields to
calculate the proportion defects for that sample.
You do not need a SAMPLE field, but you may want one if the field will
contain information about the sample, such as the hour taken. Then, when you
display detailed information about a particular point, you can quickly
identify its source.
To create this database, choose the Create a Database option from the FILE.
You can choose to Create a custom database, or you could use a pre-defined
structure that meets your needs. Once you have created the database, enter
the data, one sample per record, where each record includes a sample size
field and at least one count field.
##41,Pareto Charts ##PARETO
PARETO CHARTS
=============
The Pareto chart is a specialized bar-chart used to determine priorities
for quality improvement. The items displayed in the chart are arranged in
decreasing order by frequency of occurance. KWIKSTAT allows you to read in
data to form a Pareto Chart in two ways:
o Read data, calculate frequencies, display plot - KWIKSTAT
reads raw counts from a database similar to the frequency procedure.
o Read frequencies, display plot - In this case, KWIKSTAT reads
frequencies that have already been tabulated.
Example: To create a chart by reading data, use a database with two fields
(e.g.,FAILURE & MACHINE) like this:
FAILURE MACHINE
Drift 1
Drift 2
Tubing 2
etc:
KWIKSTAT can also "by" group charts.
##42,Multiple Comparisons ##MC
MULTIPLE COMPARISONS
====================
KWIKSTAT provides three methods of performing multiple comparisons: Newman-
Keuls, Tukey and Scheffe. Choose the default comparison test in the setup
procedure. The default comparison will be used in Analysis of Variance
comparison procedures. For some comparisons (i.e., Non-Parametic) the Tukey
procedure will be used, no matter what default procedure you choose.
However, you can use the Mutliple Comparison module to perform comparisons that
are not automatically provided as a part of another comparison procedure (i.e.,
ANOVA), using any of the comparison types. This module also provides Dunnett's
test for comparison of all other group means to a control.
##43,Using the FILE menu ##FILE
USING THE FILE MENU
===================
NEW DATABASE - Create a new database. You must create a new database and enter
data before doing any analysis or creating a graph.
OPEN A DATABASE - Open an existing database.
SUBSET DATABASE - Create a database that is a subset of the current database.
COPY/BACKUP - Create a backup copy of a database for safety purposes.
LIST (DISPLAY) THE CONTENTS OF THE DATABASE - Display data to the screen.
MODIFY OR DISPLAY DATABASE STRUCTURE - View or change characteristics about the
database, including field widths and types.
KILL DATABASE - Delete a database file from your disk.
FILE UTILITIES - Import data, create reports, sort a database or output data.
EXIT - End the program.
##44,Using the EDIT menu ##EDIT
USING THE EDIT MENU
===================
The EDIT menu contains options that allow you to enter new data into a
database, edit data currently in a database, and other editing options:
EDIT RECORDS - Change data already in the database.
APPEND RECORDS - Add new records to the database.
MISSING VALUE CODES - Define missing value codes for your database. Refer to
the section titled "Setting Missing Value Codes" in Chapter 2 for the manual.
PACK DATABASE - Permanently erase all records marked for delete.
ZAP - Get rid of all records in a database.
##45,Using the HELP menu ##HELP
USING THE HELP MENU
===================
The KWIKSTAT Help system contains items to help you operate the program. These
include:
Program Help - Contains general program help information.
TUTOR - Displays a tutorial to help you learn how to use the program.
Decide What Analysis to Use - Displays a decision tree similar to the one in
Appendix E of this manual.
Change Setup Options - Select setup options including default path, colors,
printer type, multiple comprison test, etc.
AUTOHELP/Hints (On or Off) - Toggles the help/hint messages that appear on
some menus.
Go to DOS, Return with Exit (Shell) - Temporarily shell to DOS.
##46 ##VIEW
USING THE KWIKSTAT VIEWER
=========================
The KWIKSTAT viewer allows you to examine output from an analysis that
could be too big to appear on one screen. When the viewer appears, you
can move around the displayed results by pressing the arrow keys, PgUp,
PgDn, Home and End. If you are using a mouse, you can use the scroll
bars on the right side and bottom to position the output on the screen.
The function key commands available in the viewer are described below. To
activate one of these commands, press the function key or click the option
on the button bar at the bottom of the screen:
F1 - Display this help screen.
F3 - Send setup code to printer (for condensed print, etc.)
F5 - Goto a line in the output (Press F5, then enter a line number.)
F7 - Exit the viewer.
F8 - Define size of margin for output.
F9 - Define a title to be used on output.
F10 - Output the contents of the viewed file to a printer or file. When you
choose this option, the default output is the port you specified
in the program setup (i.e., LPT1: meaning line printer port 1). You
can press Enter to accept this default, or type a file name to
save the contents to a file.
##47 ##OPTIONS
OPTIONS WHILE DISPLAYING A GRAPH
================================
When a graph is displayed on your monitor, you can choose other
options from the plot menus.
The main graph menu appears at the top of the graph. To choose options from
this menu, press the first letter of the option name (i.e., E for Exit) or
point to the option with the mouse pointer and click the left button once.
Here is a description of the meau options. Depending on the graph, some of
these options may not appear on your menu:
o Exit - Exit the plot.
o Options - Display the plot options screen, where you can change options,
then replot the graph.
o Print - Print the graph to the printer.
o Cap/PCX - Capture the graph as a .PCX file.
o Set Colors - Display the color options menu.
o + - Begin cursor pointing mode. See "Display Graph Detail Option" below.
o Help - Display information about using the graph menu.
##48
Color Options Menu
------------------
The color options menu allows you to choose what colors to use in
displaying the graph. To choose options from this menu, press the
first letter of the option name (i.e., E for Exit) or point to the
option with the mouse pointer and click the left button once. The
options are:
o Menu - Return to the main menu.
o Graph - Change color of plot - cycles through 15 colors.
o Screen - Change background color - cycles through 15 colors.
o Text - Change text color - cycles through 15 colors
o Default - Returns graph to default colors.
o B&W - Displays plot in Black and White. It is usually best to display a plot
in B&W mode before printing to a printer.
o Tile - In some plots, causes colors to be displayed as tile patterns.
o Help - Display help on using the graph menu.
##49
Display Graph Detail Option
---------------------------
On some graphs, you can choose to display information about specific points
on the graph, or take a closer look at a particular portion of the graph.
Using the point and look technique (SmartPoint (tm), you can quickly identify
interesting points in the graph. For example, if several points are over the
limit, you might discover immediately that all of these points came from
samples taken from a single machine. When a chart is displayed, you can:
o Smartpoint - Select a point on the graph, and display information about
the database record associated with that point.
o Take a closer look - Zoom the graph in and out to take a closer look at an
area of the plot.
Mouse technique - If you are using a mouse, select a point by moving the
mouse pointer to the point you want to see, then click the left button once.
Cursor technique - Choose the <+> option from the main menu by pressing + on
the keyboard. A small "+" will appear in the middle of the graph. Use the
arrow keys to move the "+" to the point on the graph, then press Enter.
##50 ##REPLACE ##SUBSET
Using Functions & Expressions in the REPLACE and SUBSET
=======================================================
"REPLACE WITH" FIELD (in Replace option): Use either a math expression
or a database expression.
CONDITION FIELD (in Replace and Subset) : Use only a database expression.
A database expression allows many mathematical and character expressions,
as described below. The math expression is provided for performing
calculations using scientific mathematical functions. In the REPLACE WITH
field, the default expression type is the database type. In order for an
expression to be evaluated as a strictly math expression, you must place
an equal sign "=" at the beginning of the expression.
For example, if you want to perform the calculation WEIGHT/HEIGHT,
you can enter the expression as-is in the REPLACE WITH field.
(continues...)
##51
(REPLACE & SUBSET continued...)
However, if you want to calculate the log of WEIGHT/HEIGHT, you
must enter the expression as
=LOG(WEIGHT/HEIGHT)
since the LOG function is not supported as a database expression
function. The equal sign signals to the program to use the math
calculator. The information below outlines the capabilities of both
expression types.
Mathematical operators:
Add + Subtract -
Divide / Multiply *
Exponentiation ^ (Math calculator only)
For Character fields, the database calculator supports the
operation: Add + (appends one string to another)
(continues...)
##52
(REPLACE & SUBSET continued...)
Following are a few examples of correct expressions:
AGE/HEIGHT
=SCORE^2 (= signals math calculator)
LTRIM(FIRST)+' '+LAST
Note: Literal strings included in expressions must be surrounded by
single quotes. For example, 'Hello' is a literal string. Character
field names are used without quotes. For example, NAME is a field
name. A correct string expression using these two strings would be:
'Hello '+NAME
TIP: Only if you use a numeric operation or function not supported by
the database calculator will you need to place an equal (=) sign at
the first of the expression. For a list of the functions supported,
refer to Chapter 2 in the manual.
(continues...)
##53
(REPLACE & SUBSET continued...)
Following are some example uses of functions in REPLACE or SUBSET:
ASC - Converts the first character of a string to its ASCII code.
For example, the function ASC('A') would return the value 65, since
65 is the code for an uppercase A.
AT - Returns the starting position of one character string within
another character string. For example, the expression AT('Bill',
'Wild Bill') = 5 since the string 'Bill' begins five characters
deep in the string 'Wild Bill'.
CALENDAR and JULIAN - The JULIAN function converts a date into a
number, where 1 is January 1, 1583. CALENDAR converts a julian
number into a Date. You can convert dates into numbers, then find
the number of days between dates by subtraction.
CAPS - Converts the first letter of each word into a capital. For
example, CAPS('this is a test') would become 'This Is A Test'.
(continues...)
##54
(REPLACE & SUBSET continued...)
CHR - Converts a number into its ASCII value. For example, CHR(65)
is equal to the character string 'A'.
DELETED - Returns a T if the current record is marked for delete,
else it returns an F. Can be used to conditionally replace a value
depending on whether the record is deleted or not.
IIF - Selects between two expressions. The syntax is
IIF(logical expression, expression1, expression2). The logical
expression is either T or F. If the logical expression is T, then
returned value of this function is expression1, else the returned
value is expression2.
INT - Rounds down to nearest integer. INT(3.2) is equal to 3.
LEFT and RIGHT - Returns the left or right portion of a string. For
example, LEFT('Wild Bill',3) would return the string 'Wil' and
RIGHT('Wild Bill',3) would return the string 'ill'.
(continues...)
##55
(REPLACE & SUBSET continued...)
LOWER and UPPER - Returns lower or upper case string. For example,
LOWER('Wild Bill') would return 'WILD BILL'.
LTRIM, RTRIM and TRIM - Trims blanks from right, left or both ends
of a string. For example, LTRIM('Wild Bill ') would return 'Wild
Bill'. If the field FIRST contained the string 'Mark ' (6 blanks on
the end) and the field LAST contained 'Walker ' (7 blanks on the end),
the expression FIRST+LAST would be 'Mark Walker '. To obtain
the string 'Mark Walker' you would use RTRIM(FIRST)+' '+RTRIM(LAST).
SUBSTR - Extracts a string from the middle of a string. For
example, SUBSTR('Wild Bill',3,4) would be 'd Bi', which begins with
the 3rd character in the initial string, and is 4 characters long.
If the 4 were left off, the result would be 'd Bill' -- which is
the remainder of the string starting with the 3rd character.
VAL - Returns the value of a string. For example VAL('24') is the
number 24.
---END OF HELP---