ols

Section: Misc. Reference Manual Pages (Local)
Updated: 24 May 1991
Index Return to Main Contents

NAME

ols - Estimate linear regressions.

SYNOPSIS

ols [-h] [-p] [-raw] [-l labelstring] [-m model_spec ] [-epp] [inputfilename]

DESCRIPTION

ols takes a set of observations from the specified input file. There must be exactly one observation per line. If no input file is specified, ols reads data from Standard Input.

How is this input file organised? All lines must have the same layout of variables (fields), and the same number of variables. Given a set of variables in a data file, you can run any regression which uses some (but not necessarily all) of these variables as l.h.s. or r.h.s. variable(s).

Defining the layout of the input lines and choosing r.h.s. and l.h.s. variables "by hand" is slightly tedious. ols offers a default behaviour which might be useful. In the event that you do not choose l.h.s. and r.h.s. variables yourself, it assumes you want a regression as follows: the last (rightmost) variable on each line is considered to be the l.h.s. variable and all other variables are used as r.h.s. regressors.

The Intercept

ols knows nothing about the intercept of the regression line. By default, there is no intercept. If you want an intercept, you must include a variable in the dataset which always takes the value 1.0. This is readily done using Unix tools like awk(1) or sed(1). Notice that omitting the intercept implies forcing the regression line to pass through the origin. If this restriction is forced unintentionally, it will almost certainly lead to nonsensical results. Make sure you have a good reason for wanting to impose such a restriction, and be aware that ols will implicitly impose this restriction if you do not include a r.h.s. regressor which always takes the value 1.

OPTIONS

-h: This option gives some minimal quickstart help. All other arguments are ignored if it is found.
-l labelstring: This option allows you to attach variable names to your variables. Thus, if you have three numbers per line in a regression of y on x with an intercept, you could use a labelstring of the form 'constant x y'.
The labels do several things. They help in the pretty printing of the regression results. When the -p switch is used, the labels enable a more readable generated awk script. Finally, labels are essential when specifying a model different from the default model (using the -m switch).
If no labels are specified, ols invents variable names of the form $n for use in printing the results. Similar defaults are used in the generated awk code.
The individual labels must be delimited by whitespace (spaces or tabs). Note that you should quote labelstring to make sure shell treats the labels as one argument.
-m model_spec: This option allows you to specify a model different from the default. Suppose the labelstring is '_one x y z'. The default model being estimated is z = f(_one, x, y). If you want to estimate a model y = f(_one z), you would use the flags -m 'y = _one z'.
-p: This option disables the display of regression results. Instead, ols prints a awk(1) program on Standard Output. This awk program does prediction using the estimated regression model.
The generated awk code is indented and commented. This is because there are situations where the most convenient way to execute a task is to use ols to do the estimation, and modify the generated awk code to fit the job at hand.
The generated awk code works with either of awk(1) or nawk(1).
-raw: The -raw option produces a highly truncated output. Instead of the full regressions results, ols merely produces one line on Standard Output: all the regression coefficients followed by the regression standard error.
-epp: This option disables the normal display of regression results. Instead, ols prints the results of estimation in a form more suited to post-processing. A post-processor which converts this into a LaTeX table (called epp2tex) is shipped with ols. See the example below.

EXAMPLES

You can get started without much fuss:
     ols < data.1k estimates a regression using data in the file 'data.1k', using the default choice of l.h.s. and r.h.s. variable(s). An example of integration with awk on input:
     awk '{print "1 " $0}' data.text  |  ols -l 'cons x y' Here, the file data.text contains observations which are (x, y) points. By default ols estimates regressions without an intercept. So we use awk(1) to add in the vector of 1s. The -l is used to describe the layout of the input lines: 1.0, x and y. An example of using generated awk code:
    ols -p data.est > a.awk
    awk -f a.awk data.est > insample.predictions
    awk -f a.awk data.osp > outofsample.predictions Here, the -p flag is used to get ols to produce an awk program which does prediction. This program is applied to the estimation dataset data.est to get insample predictions, and to a different file (data.osp) to get out-of-sample predictions. The -epp flag is used for getting results in a form suitable for post-processing. The program epp2tex.nawk is shipped with ols. Thus you could say
    ols -epp -l '_one x y z' -m 'z = _one x' datafile | nawk -f epp2tex.nawk > a.tex Lookup the epp2tex documentation for details on switches recognised by it.

NOTES

This is version 1.0.

The biggest dataset usable is somewhat smaller than swap space. The amount of memory consumed is exactly what the dataset calls for.

There are no hard limits on either the number of r.h.s. variables or the number of observations.

BUGS

It doesn't know about missing data. Fixing this is not on the cards. If you do an estimation with seemingly straightforward data, and get blatantly nonsensical estimates, then it's possible that you have a linear dependence among the r.h.s. variables. Try this dataset on ols for an example:
    1 1 1 2
    1 2 2 7
    1 5 5 3
    1 4 4 19 ols is very low on intelligence in sensing such degenerate multicollinearity. This will be remedied in the next version.

AUTHOR

Ajay Shah, Rand Corporation, Santa Monica, CA
Ajay_Shah@rand.org

This document was created by man2html, using the manual pages.
Time: 03:07:11 GMT, February 10, 2022