DP Tool Club 25

home *** CD-ROM | disk | FTP | other *** search

/ DP Tool Club 25 / CD_ASCQ_25_1095.iso / dos / educ / kstat41 / xxks2.lzh / KS4PROG.HLP < prev next >

Wrap

Text File | 1995-08-16 | 56KB | 1,167 lines

KWIKSTAT 4 Professional Program Help Screens @1,Data Topics:Data entry, import, reports and sorts @6,Descriptive Statistics @7,Graphs - Descriptive and Comparative @12,Spin Plot @13,t-test and ANOVA, paired or independent groups @15,Non-Parametric Comparisons @16,Simple Linear & Multiple Regression and Correlation @18,Crosstabulations, Frequencies, Chi-Square @19,Life Table and Survival Analysis @20,Data Generation and Simulations @22,Advanced ANOVA Designs @24,Advanced Regression @34,Time Series Analysis @36,Quality Control Charts @41,Pareto Charts @42,Multiple Comparisons @43,Using the FILE menu @44,Using the EDIT menu @45,Using the HELP menu (Includes Setup) @46,Using the KWIKSTAT Viewer @47,General Graph Options @50,Using REPLACE & using functions in REPLACE and SUBSET ##1 ##FILE DATA TOPICS:DATA ENTRY, IMPORTS, REPORTS AND SORTS ================================================== Data may be entered from the keyboard, or from an ASCII text file. Data already stored in a dBASE III or IV file may also be used. Data may also be imported from comma delimited and 1-2-3 files. ENTERING DATA FROM THE KEYBOARD 1. CREATE THE STRUCTURE OF YOUR DATABASE by selecting the "NEW DATABASE" option in the FILE menu. 2. ENTER/APPEND DATA by choosing the "APPEND RECORDS" option on the EDIT menu. 3. EDIT DATA by choosing the "EDIT RECORDS" option on the FILE menu. 4. CREATE NEW VARIABLES and REPLACE CONTENTS OF CURRENT FIELDS by choosing the F9/Field option while editing data. ##2 ENTERING DATA FROM AN ASCII TEXT FILE Create a database structure using CREATE. Structure should match the columns of data in the data file. For example, your data is in a file named "MYDATA.TXT". A database structure could be created using the following format: Field Type Width Dec NAME C 10 AGE N 2 0 BDATE D 8 ^ ^ ^ ^ │ │ │ └─────────────Number of decimals in numeric data │ │ └───────────────────Columns where data is found │ └─────────────────────────Data type └────────────────────────────────Variable (field) name This means NAME is in columns 1-10, AGE in 11-12 and BDATE in 13-21. NOTICE: The format MUST be inclusive of all columns. DO NOT SKIP COLUMNS when specifying where data is located. ##3 ENTERING DATA FROM A DBASE III or IV FILE ----------------------------------------- KWIKSTAT reads data directly from dBASE III and IV files. In each module, you may specify which dBASE file to use. The module will display all ".DBF" files in the default path by listing them in a pick box. To choose the database to use, press the up or down arrow keys to highlight the name of a database, then press Enter. You may also call files from other directories by pressing the F2 key when the database list appears. Specify another path for the program to search, enter \DB3. A new pick list appears listing .DBF files in the specified path. ##4 IMPORTING DATA -------------- FROM LOTUS 1-2-3: Import WKS, WK1 files by choosing the Utility option on the FILE menu. For WK* import, you need to know the range in the spreadsheet, such as A1..D15. KWIKSTAT imports 1-2-3 files for versions 1 and 2 of 1-2-3. Also, most versions of 1-2-3 contain options for exporting data in a DBF file format, so you can save the data from the spreadsheet as a DBF file, then use it directly in KWIKSTAT. Other programs such as EXCEL also will save data as a DBF file. (Look in your program's index (i.e. the EXCEL manual) under DBF or dBASE.) IMPORT COMMA DELIMITED ASCII FILES: Import files where data is in the form: 23,34,"label",11 by choosing the Utilities option from the FILE menu. NOTE: Once you have imported, you can change field name, width, etc. by using the Modify option on the FILE menu. ##5 ##OUTPUT REPORTS, DATA OUTPUT, SORTING DATA ---------------------------------- The Utility option on the FILE menu allows you to: o Output a report, listing the data in the dataset (or a selected subset of the database). You may view the report before printing it. o Output the data into a standard ASCII TEXT (SDF) file. This is useful for transferring the data to other programs. o Sort data either ascending or descending by specifying field(s) as the sort keys. For example, use AGE to sort by AGE or STATE+CITY to sort city within state. ##6 ##STAT DESCRIPTIVE STATISTICS ====================== DETAILED STATISTICS - gives mean, standard deviation, etc., plus percentiles, confidence interval and a box plot on one variable at a time SUMMARY STATISTICS - gives mean, st. dev. etc. on several variables at a time, and allows listing of statistics by a grouping factor P-VALUE - calculates p-value for Z, t, Chi-Square and F statistics DETAILED STATISTICS FROM KEYBOARD ENTRY - Enter data from the keyboard as numbers or in grouped numbers STEM and LEAF Display - summarizes data using a table/graph ##7, ##GRAPHS GRAPHS - DESCRIPTIVE AND COMPARATIVE ==================================== The GRAPH module contains 6 graph types. These are: o BAR/LINE/AREA Graph - shows distributions of frequencies o PIE CHART - shows distributions of data by percent of whole o TIME-SERIES - examines ordered data across time o XY (SCATTERPLOT) - examines the relationship between two variables o HISTOGRAM - examines the distribution of a continuous variable o BY-GROUP - compares two or more groups graphically using means, medians, box and whiskers or dot plots ##8 CREATING A BAR CHART -------------------- Create the database and enter the data (see Creating a Database). Your database should look something like this: - It should contain a Label field & a Value field (the Value field contains the numbers to use for the plot.) For example: the MAGNET is the LABEL field and NAILS is the VALUE field. ----these are the fields----- RECORD MAGNET NAILS ------ ------ 1 SMALL 38 ─┐ 2 MEDIUM 46 │──- this is the data to plot 3 LARGE 59 ─┘ │ └─────────────────── these are the labels for the plot NOTE: This will create a BAR CHART with 3 bars labeled SMALL, MEDIUM & LARGE. You could also use this same data to create a pie chart. NOTE: You can have more than one value field, & create a side-by-side bar chart or a stacked bar chart. This data can also be used for a line chart or an area chart. ##9 CREATING A PIE CHART -------------------- Create the database and enter the data (see Creating a Database). Your database should look something like this: - It should contain a Label field - It should contain a Value field (contains the numbers to use for the plot) For example: the COLOR is the LABEL field and COUNT is the VALUE field. This data refers to hair color for 50 people in your class. ----these are the fields----- RECORD COLOR COUNT ------ ------ 1 BLONDE 9 ─┐ 2 BROWN 14 │──- this is the data to plot 3 BLACK 22 │ 4 RED 5 ─┘ │ └─────────────────── these are the labels for the plot This will create a PIE CHART with 4 slices. You could also use the same database to create a bar chart. ##10 CREATING A SCATTERPLOT/REGRESSION LINE -------------------------------------- Create the database and enter the data (see Creating a Database). Your database should look something like this: - It should contain a GROUP field (if you have more than one group) - It should contain two or more value fields For example: Sex is the GROUP field and Height and Weight are VALUE fields. ----these are the fields----- RECORD SEX HEIGHT WEIGHT ------ ------ ------ 1 M 70 202 ─┐ 2 M 65 145 │──- this is the data to plot 3 M 72 188 ─┘ : F 60 103 22 F 62 122 23 F 59 112 └─────────────────── this is the group field This will create a SCATTERPLOT and REGRESSION LINE PLOT. ##11 CREATING A TIME SERIES/LINE PLOT -------------------------------- Create the database and enter the data (see Creating a Database). Your database should look something like this: Data should contain one or more value fields and an optional label field. For example: Sales1 is a VALUE field for team1 and Sales2 for team2. Data is in time order. In this example, sales for a month are: ----these are the fields----- RECORD DAY SALES1 SALES2 ---- ------ ------ 1 1 4 3 ─┐ 2 2 5 4 │──- this is the data to plot 3 3 6 4 ─┘ : 4 5 31 23 6 5 ─────────────────────── └─────────────────── these are the group fields This will create a TIME SERIES PLOT with two lines. ##12, ##SPIN SPIN PLOT ========= A spin plot is a three dimensional (XYZ) scatterplot. You must have a database containing at least three numeric variables. Usually, these variables should be continuous. Optionally, you can specify a grouping variable. The plot will display the data by group using colors or point patterns. Choose how the data will be displayed from the <Options> menu. You can choose to turn the axis on and off, display a box, display rays, etc. Use the menu at the right of the screen to move the plot in any of three directions. To cause the plot to move continuously, hold the CTRL key down and press a directional key (i.e., CTRL-rightarrow). ##13 ##TTEST T-TESTS AND ANOVAs ================== FOR INDEPENDENT GROUPS OR SINGLE GROUP -------------------------------------- TWO GROUPS:Student's t-test, data expected to have a grouping variable, also provides a test for the equality of variance, and two versions of the t-test according to whether the variances can be considered equal. 3 TO 10 GROUPS: One way ANOVA, Multiple comparisons performed, data must have a grouping variable. Comparative plots displayed. T-TEST AND ANOVA from summary data - comparative plot (no box plots) can be displayed. Single sample t-test - you choose the hypothesized value to test. ##14 ##REPEAT T-TESTS FOR PAIRED OR REPEATED MEASURES --------------------------------------- TWO TIME PERIODS OR TWO PAIRED OBSERVATIONS:Students t-test for paired observations. Data is expected to be paired within each record in the database. For example: two fields in the database could be: Before After 200 175 130 123 etc. 3 TO 10 REPEATED MEASURES: An extension of the t-test, with 3 or more repeated measures. Repeated Measures ANOVA performed, with Newman-Keuls multiple comparisons. Comparative box plots displayed. ##15 ##NPAR NON-PARAMETRIC COMPARISONS ========================== Note:Use non-parametric procedures when the data cannot be assumed to be normally distributed. FOR INDEPENDENT GROUPS OR SINGLE GROUP -------------------------------------- TWO GROUPS: Mann-Whitney U, comparison based on ranks of the data. 3 TO 10 GROUPS: Kruskal-Wallis One-way ANOVA based on ranks; Multiple comparisons performed at the 0.05 significance level. FOR PAIRED OR REPEATED MEASURES ------------------------------- TWO TIME PERIODS OR TWO PAIRED OBSERVATIONS: Friedman's Test. 3 TO 10 REPEATED MEASURES: Friedman's ANOVA with Multiple comparisons. ##16 ##REG LINEAR REGRESSION ================= SIMPLE LINEAR REGRESSION - relating two variables. This procedure provides an equation representing a straight line fitted through the data, and a test of the significance of the linear relationship. You can also plot the data to verify a linear trend and to examine residuals. MULTIPLE REGRESSION - Allows you to relate up to 10 independent variables to a dependent variable. The significance of each variable is determined, and the coefficients to a prediction are calculated. You can use the information on the significance of each variable to determine what variables to leave in the equation and which to remove in order to find the best equation possible. ##17 CORRELATION PROCEDURES ---------------------- CORRECTION calculates the Pearson and Spearman correlation coefficient for a pair of variables. The significance of the coefficient is also given. Usually, Pearson's is calculated when the data are normal, and Spearman (which is based on ranks) is used for non-normal data. MATRIX OF CORRELATIONS - allows you to calculate combinations of correlations (Pearson) on up to ten variables at a time. DISPLAY A MATRIX OF SCATTERGRAMS - allows you to visually examine the relationship on pairs of data for up to 10 combinations at a time. ##18,Crosstabulations, Frequencies, Chi-Square ##CROSS CROSSTABULATIONS, FREQUENCIES & CHI-SQUARE ========================================== The Crosstabulations, Frequencies, Chi-Square module performs analyses on categorical data, that is, data observed in categories, rather than measurement data. Generally, categorical data are entered into a database by using one record for each person or entity on which the observation is made and one field for each characteristic which is divided into categories. For example, to categorize ten people by sex, hair color and eye color, you would need ten records (one per person) and three fields (e.g., SEX, HAIR, EYE). Some of the procedures in this module give you the choice of simply entering totals for each category rather than creating a database and entering the results of each observation. This can save time if totals are known and only totals are needed to perform a test or calculation or to produce a graph. KWIKSTAT "counts" the occurrence of each data value for a single variable or field and displays that information in a table or in a graph. ##19,Life Table and Survival Analysis ##LIFE LIFE TABLE AND SURVIVAL ANALYSIS ================================ As the name indicates, this module performs life tables (either actuarial or Kaplan-Meier) and survival comparison procedures. The data must be in the following form: 1) a TIME variable which contains a time (e.g., minutes, days, years, etc.) in which the subject or component has been observed to be alive (not failed). 2) a CENSOR variable which must take on the values 0 or 1, where 1 means the subject has died (failed), and a 0 means the subject was still alive (not failed) at the last available time period. 3) optionally, a GROUPING variable which may have up to ten values (numeric or character), i.e., the data may be in groups. A plot is given for the cumulative proportion surviving in the group(s) against time. If more than one group is entered, a Mantel-Haenszel test is performed to test the hypothesis of equal survival patterns for the groups. ##20 #SIMU DATA GENERATION AND SIMULATIONS =============================== A) GENERATE DATA SETS This option allows you to create KWIKSTAT (dBASE) data sets from a Normal, Uniform or Exponential distribution. You will be asked to specify the number of variables to generate. Then for each variable, you must specify if it is to be from a Normal, Uniform or Exponential distribution. B) 95% CONFIDENCE INTERVAL SIMULATION The 95% Confidence Interval simulation shows visually the meaning of a 95% confidence interval. In this simulation, 100 samples of size 30 are drawn from a normally distributed distribution and a 95% C.I. is calculated for each sample. The true mean of the population is plotted as a horizontal line on the screen. Each C.I. is plotted vertically on the screen so you can visually see the range of the C.I. and whether or not it covers the true population mean. All 100 C.I.'s are plotted and a summary of how many covered the population mean is displayed on the screen. (continued) ##21 (Simulations continued...) C) FLIP A COIN DEMONSTRATION When you flip a fair coin, you would expect the percentage of heads to approach 50% over a long period of time. This simulation automates 100 coin tosses and graphs the results. You should repeat this simulation a number of times to see how the graph varies for different series of flips. D) DEMONSTRATE DISTRIBUTION OF SAMPLE MEAN (CENTRAL LIMIT THEOREM) In this demonstration, a population of 500 points is generated from either a Normal, Uniform or Exponential distribution. Then, 100 samples of sizes 1, 2, 3, 5, 10 and 30 are taken from the population. A histogram of the original "population" is displayed on the left side of the computer screen and a histogram of the 100 sample means is displayed on the right side of the screen. As each pair of histograms is displayed, you can see how the right side histograms approach a bell-shape as the sample size increases. Also, as the sample size increases, the spread of the histogram gets smaller, the standard deviation decreases by n times. The demonstration will automatically cycle through the 5 sizes of samples. E) LIST VALUES FROM A DATABASE TO THE SCREEN This option allows you to display the contents of a database to the screen. ##22,Advanced ANOVA Designs ##ADVAOV ADVANCED ANOVA DESIGNS ====================== Two-Way ANOVA (balanced or unbalanced)- An analysis of variance is a method of comparing means between several experimental groups. In a two-way analysis of variance, the experimental design consists of two grouping factors and one or more observations on each combination of the grouping factors. For example, suppose you have designed an experiment to examine the effectiveness of several display strategies on sales. You have selected three display widths, and two heights, giving you 6 display combinations. In order to make comparisons of sales for each combination of height and width, you want to place one of the 6 display combinations at each of several stores. Then, after a period of time, you will examine the sales from each combination to see if you can discover which combination produces the most sales. (See the database on disk named SALES.DBF.) NOTE: If the data are balanced (no missing values) both the balanced and unbalanced analysis will yield the same results. However, the unbalanced procedure usually takes longer to compute. Therefore, use the unbalanced procedure only when you have unequal sample sizes per cell in the design. (continues...) ##23 (ADVANCED ANOVA DESIGNS continued...) Two-Way Repeated Measures Analysis - In a two-way analysis of variance, it is common to examine one "subject" at several points in time, or under several conditions. This differs from the replicates on the two-way analysis example where the "replicates" are unrelated. In a repeated measures example the replicates are related. Data for an example repeated measures two-way analysis is included in the database REPEAT2.DBF on disk. In this example, there are two methods of calibrating DIALS (factor A), and the levels of B are four SHAPES of the dials. Six subjects were randomly assigned to perform the calibrating on a particular dial (A) for all four shapes of dials. That is, each of the six subjects were observed four times, once for each combination of the DIAL/SHAPE settings. The scores observed are accuracy. ##24,Advanced Regression ##ADVREG ADVANCED REGRESSION =================== Regression analysis is used to model relationships between a dependent (response) variable and one or more independent (predictor) variables. The KWIKSTAT Advanced Regression Module includes three regression options: o Polynomial Regression o All Possible Regressions o Stepwise Regression o Customized (Least Squares) Regression Calculations Polynomial Regression Analysis is useful for determining whether higher order terms (squared, cubed,etc.) of a single predictor (independent) variable are helpful in modeling the relationship with the response (dependent) variable. All Possible and Stepwise Regression options are methods for the final step in multiple linear regression, that of selecting a set of predictor variables for appropriately modeling the relationship between the predictors and the response. Customized Regression allows you to define the contents of the regression matrices that are used in calculating the regression equation. (continues...) ##25 ##POLY (Regression continued...) Polynomial Regression --------------------- Polynomial regression is considered in a situation in which the relationship between predictor and response variables is curvilinear. The data in GAME.DBF are the ages of 29 players and their scores on a new video game (generated data). If you plot that data on an XY plot, it appears that the relationship between AGE and SCORE is not clearly linear, but that a quadratic term may be helpful in describing the relationship. Such a polynomial model can be recognized as a form of a multiple linear regression model with two predictor variables, X and X-Squared. In fitting a polynomial regression model, all lower order terms must be included. That is, the first-order term is used, and higher order terms are used only if the first-order term is not sufficient. A cubic term is used only if both the linear (first-order) and quadratic (second-order) terms are included. When using Polynomial Regression in KWIKSTAT, you are asked to specify the order of the polynomial you wish to fit. (continues...) ##26 (Polynomial Regression continued...) As with any multiple regression analysis, care must be taken to avoid collinearities between the predictors. That is, if the predictors are highly correlated, the coefficient estimates may contain considerable error. Centering the data is an option, but may not always sufficiently reduce collinearities, in which case the data should be standardized (divide the centered values by the standard deviation of the predictor variable values). There are various approaches for determining the order of the model. One method is a "forward selection" procedure in which the first-order (linear) term is fit and then higher order terms are added sequentially until the F-test for a non-zero coefficient is not significant for the highest order term. Another method is a "backward elimination" procedure in which an appropriately high-order polynomial model is fit and terms are deleted one at a time from high to low order until the highest order term of the remaining terms results in a significant F-test. These two methods may not result in the same model. (continues...) ##27 (Polynomial Regression continued...) KWIKSTAT fits a model of the order you select and reports the coefficients of each term, including an intercept term, up to that order. The results of the tests of significance of these coefficients are also reported. A small p-value indicates that the corresponding coefficient is significantly different from zero. Residual analysis is also useful for investigating the appropriateness of the model selected. KWIKSTAT also reports the Analysis of Variance for the entire regression fit, as well as R-Square and adjusted R-Square, as it does in the regular Linear Regression module. In general, in regression analysis simpler models are preferred. It may be possible to transform the predictor in some way so that higher order terms are not necessary. Terms higher than second or third order are not usually used unless there is some reason inherent in the data. It is always possible to fit a high enough order model, but such a model is difficult to interpret and not generally recommended. (continues...) ##28 (Polynomial Regression continued...) As with any regression model, extrapolation is dangerous and should be avoided. While a polynomial model may adequately model the relationship between variables within the range of the data used in the analysis, it is extremely risky to assume that relationship continues to exist outside the range of the data. Refer to a standard text, such as Neter and Wasserman or Montgomery and Peck, for more information about polynomial regression. ##29 ##ALLPOSS All Possible Regressions ------------------------ Also known as "best subset selection", this procedure consists of considering all possible combinations of the predictor variables. Comparison can be based on a number of criteria, including mean squared error, Mallow's Cp, and R-Square. The calculated MSE is an estimate of the variance of the errors in the full model. A smaller error variance is desirable, so different models can be compared based on MSE, with those having a smaller MSE preferred. R-Square, the coefficient of determination, is a measure of how much of the variability in the response is explained in the model, provided the model has been arrived at properly. A model with larger R-Square is preferred to one with a much smaller R-Square. Mallow's Cp is a statistic which is a function of the error sum of squares for the full model and that for the reduced model. Under the correct model, Cp is approximately equal to p and otherwise is greater than p, reflecting bias in the parameter estimates in the regression equation. Thus, it is desirable to select a model in which the value of Cp is close to the number of terms, including the constant term, in the model. (continues...) ##30 All Possible Regressions (continued) These three criteria are typically used to compare combinations, or subsets, of the predictor variables. When KWIKSTAT reports the results of the All Possible Regressions procedure, it reports all three of these criteria. Of course, you should also take into account any theoretical criteria for including or excluding variables, as well as be careful not to include redundant variables, which may introduce collinearities. It is often helpful to consider which variables consistently appear in the better models. The better models can then be analyzed using the Multiple Regression option of the Regression and Correlation Module in the regular KWIKSTAT program, and the results of tests for significant coefficients considered in the final decision. Residual plots of predicted values under the chosen model should show a random scatter of points. Clearly, comparing all possible models is generally the best method for making a decision about a "best" model since it provides the most information about the available choices. However, the "all possible" subsets procedure can become quite large with just a moderate number of predictors. KWIKSTAT has the capability to perform All Possible Regressions on a maximum of eight predictor variables. With eight variables, there are 28-1, or 255, possible subsets, and the procedure can take some time. ##31 ##STEPWISE Stepwise Selection ------------------ For a large number of predictors, or if for other reasons the All Possible Regressions variable selection procedure is not practicable, an alternative is the Stepwise variable selection procedure. KWIKSTAT's Stepwise option can consider up to 49 variables, and can define a model using up to 20 of those variables.. As noted earlier, the Stepwise procedure is a combination of "forward selection" and "backward elimination" techniques. At the first step, the model consisting of all variables is considered, and the variable testing "most significant", i.e., having the largest F-statistic, becomes the first variable included in the model. In the second step, the variable selected in the first step is forced into the model and the other variables are then fit. A cut-off p-value is used as the selection criteria to determine whether any more variables should be included. This cut-off p-value selection criteria can be designated by you, or else the default criteria used by KWIKSTAT is a p-value of 0.25 for the F-tests. Of those variables meeting the selection criteria at step two, the one showing the most significance, i.e., having the largest F-statistic, is added to the model consisting of the variable selected in the first step. ##32 Stepwise Regression (continued) The two-variable model is then "checked" and if the coefficients of both variables are shown to be significantly different from zero (having small p-values), the process continues. Again, the cut-off p-value can be set by you, or else the default is 0.25. At the third step, the two already chosen variables are forced into the model and the other variables then fit. If any remaining variables meet the selection criteria, the "most significant" of those is added, and the three-variable model checked. The process continues as long as all selected variables satisfy the "checking" procedure, and as long as at least one remaining variable meets the selection criteria and is added to the model at each "forward" step. The operator is also given the opportunity at each step to continue or to stop the procedure. ##33 ##CUSTOM Customized Regression Calculations ---------------------------------- At times you may wish to perform a regression (least squares) calculation that is different from those defined elsewhere in KWIKSTAT. The Customized Regression option allows you to place your own information directly into the matrices that are used to perform a regression calculation. The regression equation (in matrix form) can be written Y = Xb + e where Y is an array of the dependent variables, X is a matrix containing information about the independent variables, b (beta) is the array of coefficients for the regression equation and e (an error term.) To calculate the beta array: 1. Create a database with columns representing the Y array and X matrix. 2. In the Advanced Regression option, specify what variables in the database contain the values for the Y array and for each column of the X matrix. 3. Choose the option to perform the calculation. The results are reported . The Custom Regression procedures assume that you have the mathematical background to devise the matrices needed for this kind of analysis. See the manual for examples. ##34,Time Series Analysis ##TIME TIME SERIES ANALYSIS ==================== Time series analysis deals with attempting to model an observed series of datapoints to forecast future activity or to understand the driving mechanism. There are a number of approaches to modeling. This time series program bases its modeling techniques on the ARMA (autoregressive moving average) approach. In this approach, the researcher must first decide if there is an autoregressive (AR) and/or moving average (MA) component, and the order of each. These orders will be called p and q. Use p as the order of the AR component and q as the order of the MA component. Thus, a model will be designated as an ARMA(p,q). For example, the model ARMA(8,0) means that the order of the AR component is 8 and the order of the MA component is 0 (none). The goal is to find a model which adequately describes the process without using any extra parameters, a parsimonious model. The purpose of the KWIKSTAT Time Series program is to help you: A) Decide what ARMA model is appropriate for your data. B) Estimate the parameters of the model. C) Create a forecast. (continues...) ##35 (TIME SERIES ANALYSIS continued...) Model Identification - The first part of the analysis process is model identification. One way to determine if the data are white noise is to examine the sample autocorrelations. If they are small and uncorrelated then the process may be white noise. If the process is white noise, then approximately 5% of the sample autocorrelations (absolute values) would be expected to be greater than 2/sqrt(n) where n is the length of the series. KWIKSTAT provides a test to help you decide what model is appropriate. The W-statistic (see Woodward and Gray) technique examines the data for fit to a series of models, and returns the three "best" guesses for a model. It does not necessarily choose the best model, but it is helpful in choosing which models to consider. Estimating Parameters - Once a model has been chosen, you may estimate the values of the parameters of the model given your set of data. Forecasting - Once you have the estimates of a model, you can use this information to create a forecast. If the model estimates these to your satisfaction, then it may be a good model for forecasting into the future. KWIKSTAT allows you to forecast and plot future values of the series. An optional 95% confidence bound may be calculated to give you a range for your estimated forecast. ##36,Quality Control Charts ##QCC QUALITY CONTROL CHARTS ====================== The KWIKSTAT Quality Control module allows you to perform quality control calculations and produce several kinds of control charts: o X-Bar Chart (Chart on Means) o R-Chart (Chart on Ranges) o S-Chart (Chart on Standard Deviations) o Control Chart for Individual Measurments o P-Chart (Chart on Proportions) Options in displaying the control charts include: o Plot all points or a range of points on a chart. o Plot X-Bar and R-Chart on same screen. o Plot Upper and Lower control limits. o Use standard 3-sigma control limits or specify your own limits. o Select a point on the control chart with the mouse pointer or with a cursor pointer and display database values used to calculate that point. o Print chart to printer or capture to a PCX graphics file. o Interactively select chart colors. o Zoom in and out on portions of the plot to see more detail. ##37 Preparing the Data for a Control Chart Plot ------------------------------------------- The data for X-Bar or R-Charts should be stored in a database in the following format: The data that will be used to calculate the means to be plotted come from a sample of observations, with each sample containing a number of replicates. For example, you might take samples of jars filled with jelly 25 times during the day. Each time you take a sample, it consists of 3 jars. Thus, you have 25 samples, each with a size of 3 (3 replicates). The data for this chart would be stored in a database using the following setup: Sample Value 1 15.9 2 16.1 3 16.0 1 16.2 2 15.9 3 15.6 etc. (continues...) ##38 (Control Chart continued...) Each jar should contain 16 ounces of jelly. You do not want the jars too empty or too full. Thus, you may want to see that the average amount of jelly does not go under or over certain limits. Also, you do not want the range to be too wide -- which may mean that the "average" jar contains 16 ounces, but the amount in different jars may vary widely. The database needed for this analysis would contain 2 fields, SAMPLE and VALUE. To create this database, choose the Create a Database option from the FILE menu. You can then choose to Create a custom database, or you could choose the pre-defined database that contains 2 fields, where SAMPLE has a width of 1 and VALUE has a width of 5 with 2 decimal places. Once you have created the database, enter the data, one sample per record. You may use this data to display an X-Bar chart, and an R or S Chart. Also, see the sample dataset named XCHART.DBF on your disk. NOTE: A database for a control chart for individual measurements is similar to the one described here, but there is only 1 replication per sample. ##39 Displaying P-Charts ------------------- P-Charts plot a proportion of items observed from within a sample. For example, you might take a sample of 25 items from a manufacturing process each hour. Then you count the number of defects in that sample. You are interested in plotting the proportion of defects across time to observe if an unusually high number of defects begin to occur. The format for a KWIKSTAT database is: Sample SampSize Defect1 Defect2 Defect3 1 25 0 1 2 2 25 0 0 0 3 25 1 1 0 etc. For example, you might observe 3 kinds of defects, defect 1 is a color problem, defect 2 is a weight problem and defect 3 is a function problem. Thus, for sample 1 the proportion of defects found is 2/25. Using this same database, you could also create a P-Chart that only considers defects of type 3. In this case, you would choose only the SampleSize and Defect3 fields for analysis, and the proportion observed for sample 1 would be 1/25. (continues...) ##40 (P-chart continued...) The minimum number of fields in the KWIKSTAT database needed for this chart is two, a Sample Size field and a Count field. If there are more than one Count (defects) fields, the program will add up the defect fields to calculate the proportion defects for that sample. You do not need a SAMPLE field, but you may want one if the field will contain information about the sample, such as the hour taken. Then, when you display detailed information about a particular point, you can quickly identify its source. To create this database, choose the Create a Database option from the FILE. You can choose to Create a custom database, or you could use a pre-defined structure that meets your needs. Once you have created the database, enter the data, one sample per record, where each record includes a sample size field and at least one count field. ##41,Pareto Charts ##PARETO PARETO CHARTS ============= The Pareto chart is a specialized bar-chart used to determine priorities for quality improvement. The items displayed in the chart are arranged in decreasing order by frequency of occurance. KWIKSTAT allows you to read in data to form a Pareto Chart in two ways: o Read data, calculate frequencies, display plot - KWIKSTAT reads raw counts from a database similar to the frequency procedure. o Read frequencies, display plot - In this case, KWIKSTAT reads frequencies that have already been tabulated. Example: To create a chart by reading data, use a database with two fields (e.g.,FAILURE & MACHINE) like this: FAILURE MACHINE Drift 1 Drift 2 Tubing 2 etc: KWIKSTAT can also "by" group charts. ##42,Multiple Comparisons ##MC MULTIPLE COMPARISONS ==================== KWIKSTAT provides three methods of performing multiple comparisons: Newman- Keuls, Tukey and Scheffe. Choose the default comparison test in the setup procedure. The default comparison will be used in Analysis of Variance comparison procedures. For some comparisons (i.e., Non-Parametic) the Tukey procedure will be used, no matter what default procedure you choose. However, you can use the Mutliple Comparison module to perform comparisons that are not automatically provided as a part of another comparison procedure (i.e., ANOVA), using any of the comparison types. This module also provides Dunnett's test for comparison of all other group means to a control. ##43,Using the FILE menu ##FILE USING THE FILE MENU =================== NEW DATABASE - Create a new database. You must create a new database and enter data before doing any analysis or creating a graph. OPEN A DATABASE - Open an existing database. SUBSET DATABASE - Create a database that is a subset of the current database. COPY/BACKUP - Create a backup copy of a database for safety purposes. LIST (DISPLAY) THE CONTENTS OF THE DATABASE - Display data to the screen. MODIFY OR DISPLAY DATABASE STRUCTURE - View or change characteristics about the database, including field widths and types. KILL DATABASE - Delete a database file from your disk. FILE UTILITIES - Import data, create reports, sort a database or output data. EXIT - End the program. ##44,Using the EDIT menu ##EDIT USING THE EDIT MENU =================== The EDIT menu contains options that allow you to enter new data into a database, edit data currently in a database, and other editing options: EDIT RECORDS - Change data already in the database. APPEND RECORDS - Add new records to the database. MISSING VALUE CODES - Define missing value codes for your database. Refer to the section titled "Setting Missing Value Codes" in Chapter 2 for the manual. PACK DATABASE - Permanently erase all records marked for delete. ZAP - Get rid of all records in a database. ##45,Using the HELP menu ##HELP USING THE HELP MENU =================== The KWIKSTAT Help system contains items to help you operate the program. These include: Program Help - Contains general program help information. TUTOR - Displays a tutorial to help you learn how to use the program. Decide What Analysis to Use - Displays a decision tree similar to the one in Appendix E of this manual. Change Setup Options - Select setup options including default path, colors, printer type, multiple comprison test, etc. AUTOHELP/Hints (On or Off) - Toggles the help/hint messages that appear on some menus. Go to DOS, Return with Exit (Shell) - Temporarily shell to DOS. ##46 ##VIEW USING THE KWIKSTAT VIEWER ========================= The KWIKSTAT viewer allows you to examine output from an analysis that could be too big to appear on one screen. When the viewer appears, you can move around the displayed results by pressing the arrow keys, PgUp, PgDn, Home and End. If you are using a mouse, you can use the scroll bars on the right side and bottom to position the output on the screen. The function key commands available in the viewer are described below. To activate one of these commands, press the function key or click the option on the button bar at the bottom of the screen: F1 - Display this help screen. F3 - Send setup code to printer (for condensed print, etc.) F5 - Goto a line in the output (Press F5, then enter a line number.) F7 - Exit the viewer. F8 - Define size of margin for output. F9 - Define a title to be used on output. F10 - Output the contents of the viewed file to a printer or file. When you choose this option, the default output is the port you specified in the program setup (i.e., LPT1: meaning line printer port 1). You can press Enter to accept this default, or type a file name to save the contents to a file. ##47 ##OPTIONS OPTIONS WHILE DISPLAYING A GRAPH ================================ When a graph is displayed on your monitor, you can choose other options from the plot menus. The main graph menu appears at the top of the graph. To choose options from this menu, press the first letter of the option name (i.e., E for Exit) or point to the option with the mouse pointer and click the left button once. Here is a description of the meau options. Depending on the graph, some of these options may not appear on your menu: o Exit - Exit the plot. o Options - Display the plot options screen, where you can change options, then replot the graph. o Print - Print the graph to the printer. o Cap/PCX - Capture the graph as a .PCX file. o Set Colors - Display the color options menu. o + - Begin cursor pointing mode. See "Display Graph Detail Option" below. o Help - Display information about using the graph menu. ##48 Color Options Menu ------------------ The color options menu allows you to choose what colors to use in displaying the graph. To choose options from this menu, press the first letter of the option name (i.e., E for Exit) or point to the option with the mouse pointer and click the left button once. The options are: o Menu - Return to the main menu. o Graph - Change color of plot - cycles through 15 colors. o Screen - Change background color - cycles through 15 colors. o Text - Change text color - cycles through 15 colors o Default - Returns graph to default colors. o B&W - Displays plot in Black and White. It is usually best to display a plot in B&W mode before printing to a printer. o Tile - In some plots, causes colors to be displayed as tile patterns. o Help - Display help on using the graph menu. ##49 Display Graph Detail Option --------------------------- On some graphs, you can choose to display information about specific points on the graph, or take a closer look at a particular portion of the graph. Using the point and look technique (SmartPoint (tm), you can quickly identify interesting points in the graph. For example, if several points are over the limit, you might discover immediately that all of these points came from samples taken from a single machine. When a chart is displayed, you can: o Smartpoint - Select a point on the graph, and display information about the database record associated with that point. o Take a closer look - Zoom the graph in and out to take a closer look at an area of the plot. Mouse technique - If you are using a mouse, select a point by moving the mouse pointer to the point you want to see, then click the left button once. Cursor technique - Choose the <+> option from the main menu by pressing + on the keyboard. A small "+" will appear in the middle of the graph. Use the arrow keys to move the "+" to the point on the graph, then press Enter. ##50 ##REPLACE ##SUBSET Using Functions & Expressions in the REPLACE and SUBSET ======================================================= "REPLACE WITH" FIELD (in Replace option): Use either a math expression or a database expression. CONDITION FIELD (in Replace and Subset) : Use only a database expression. A database expression allows many mathematical and character expressions, as described below. The math expression is provided for performing calculations using scientific mathematical functions. In the REPLACE WITH field, the default expression type is the database type. In order for an expression to be evaluated as a strictly math expression, you must place an equal sign "=" at the beginning of the expression. For example, if you want to perform the calculation WEIGHT/HEIGHT, you can enter the expression as-is in the REPLACE WITH field. (continues...) ##51 (REPLACE & SUBSET continued...) However, if you want to calculate the log of WEIGHT/HEIGHT, you must enter the expression as =LOG(WEIGHT/HEIGHT) since the LOG function is not supported as a database expression function. The equal sign signals to the program to use the math calculator. The information below outlines the capabilities of both expression types. Mathematical operators: Add + Subtract - Divide / Multiply * Exponentiation ^ (Math calculator only) For Character fields, the database calculator supports the operation: Add + (appends one string to another) (continues...) ##52 (REPLACE & SUBSET continued...) Following are a few examples of correct expressions: AGE/HEIGHT =SCORE^2 (= signals math calculator) LTRIM(FIRST)+' '+LAST Note: Literal strings included in expressions must be surrounded by single quotes. For example, 'Hello' is a literal string. Character field names are used without quotes. For example, NAME is a field name. A correct string expression using these two strings would be: 'Hello '+NAME TIP: Only if you use a numeric operation or function not supported by the database calculator will you need to place an equal (=) sign at the first of the expression. For a list of the functions supported, refer to Chapter 2 in the manual. (continues...) ##53 (REPLACE & SUBSET continued...) Following are some example uses of functions in REPLACE or SUBSET: ASC - Converts the first character of a string to its ASCII code. For example, the function ASC('A') would return the value 65, since 65 is the code for an uppercase A. AT - Returns the starting position of one character string within another character string. For example, the expression AT('Bill', 'Wild Bill') = 5 since the string 'Bill' begins five characters deep in the string 'Wild Bill'. CALENDAR and JULIAN - The JULIAN function converts a date into a number, where 1 is January 1, 1583. CALENDAR converts a julian number into a Date. You can convert dates into numbers, then find the number of days between dates by subtraction. CAPS - Converts the first letter of each word into a capital. For example, CAPS('this is a test') would become 'This Is A Test'. (continues...) ##54 (REPLACE & SUBSET continued...) CHR - Converts a number into its ASCII value. For example, CHR(65) is equal to the character string 'A'. DELETED - Returns a T if the current record is marked for delete, else it returns an F. Can be used to conditionally replace a value depending on whether the record is deleted or not. IIF - Selects between two expressions. The syntax is IIF(logical expression, expression1, expression2). The logical expression is either T or F. If the logical expression is T, then returned value of this function is expression1, else the returned value is expression2. INT - Rounds down to nearest integer. INT(3.2) is equal to 3. LEFT and RIGHT - Returns the left or right portion of a string. For example, LEFT('Wild Bill',3) would return the string 'Wil' and RIGHT('Wild Bill',3) would return the string 'ill'. (continues...) ##55 (REPLACE & SUBSET continued...) LOWER and UPPER - Returns lower or upper case string. For example, LOWER('Wild Bill') would return 'WILD BILL'. LTRIM, RTRIM and TRIM - Trims blanks from right, left or both ends of a string. For example, LTRIM('Wild Bill ') would return 'Wild Bill'. If the field FIRST contained the string 'Mark ' (6 blanks on the end) and the field LAST contained 'Walker ' (7 blanks on the end), the expression FIRST+LAST would be 'Mark Walker '. To obtain the string 'Mark Walker' you would use RTRIM(FIRST)+' '+RTRIM(LAST). SUBSTR - Extracts a string from the middle of a string. For example, SUBSTR('Wild Bill',3,4) would be 'd Bi', which begins with the 3rd character in the initial string, and is 4 characters long. If the 4 were left off, the result would be 'd Bill' -- which is the remainder of the string starting with the 3rd character. VAL - Returns the value of a string. For example VAL('24') is the number 24. ---END OF HELP---