ols
Section: Misc. Reference Manual Pages (Local)
Updated: 24 May 1991
Index
Return to Main Contents
NAME
ols - Estimate linear regressions.
SYNOPSIS
ols
[-h]
[-p]
[-raw]
[-l labelstring]
[-m model_spec ]
[-epp]
[inputfilename]
DESCRIPTION
ols takes a set of observations from the specified input file.
There must be exactly one observation per line.
If no input file is specified, ols reads data from
Standard Input.
How is this input file organised? All lines must have the
same layout of variables (fields), and the same number of variables.
Given a set of variables in a data file, you can run any
regression which uses some (but not necessarily all) of these
variables as l.h.s. or r.h.s. variable(s).
Defining the layout of the input lines and choosing r.h.s. and
l.h.s. variables "by hand" is slightly tedious.
ols offers a default behaviour which might be useful.
In the event that you do not choose l.h.s. and r.h.s. variables
yourself, it assumes you want a regression as follows: the last
(rightmost) variable on each line is considered to be the l.h.s.
variable and all other variables are used as r.h.s. regressors.
The Intercept
ols
knows nothing about the intercept of the regression line. By
default, there is
no
intercept. If you want an intercept, you must include a variable
in the dataset which always takes the value 1.0. This is
readily done using Unix tools like
awk(1) or sed(1).
Notice that omitting the intercept implies forcing the
regression line to pass through the origin. If this restriction
is forced unintentionally, it will almost certainly lead to
nonsensical results. Make sure you have a good reason for
wanting to impose such a restriction, and be aware that
ols
will implicitly impose this restriction if you do not include
a r.h.s. regressor which always takes the value 1.
OPTIONS
- -h
-
This option gives some minimal quickstart help. All other
arguments are ignored if it is found.
- -l labelstring
-
This option allows you to attach variable names to your variables.
Thus, if you have three numbers per line in a regression of
y on x with an intercept, you could use a
labelstring of the form 'constant x y'.
The labels do several things. They help in the pretty printing
of the regression results. When the -p switch is used, the
labels enable a more readable generated awk script. Finally,
labels are essential when specifying a model different from the
default model (using the -m switch).
If no labels are specified, ols
invents variable names of the form $n for use in printing
the results. Similar defaults are used in the generated awk
code.
The individual labels must be delimited by whitespace (spaces or
tabs). Note that you should quote labelstring to make sure shell
treats the labels as one argument.
- -m model_spec
-
This option allows you to specify a model different from the
default. Suppose the labelstring is '_one x y z'. The default
model being estimated is z = f(_one, x, y). If you want to
estimate a model y = f(_one z), you would use the flags
-m 'y = _one z'.
- -p
-
This option disables the display of regression results.
Instead, ols prints a awk(1) program on Standard
Output. This awk program does prediction using the estimated
regression model.
The generated awk code is indented and commented. This is
because there are situations where the most convenient way to execute
a task is to use ols to do the estimation, and modify the
generated awk code to fit the job at hand.
The generated awk code works with either of awk(1) or nawk(1).
- -raw
-
The
-raw
option produces a highly truncated output. Instead of the full
regressions results,
ols
merely produces one line on Standard Output: all the regression
coefficients followed by the regression standard error.
- -epp
-
This option disables the normal display of regression results.
Instead, ols prints the results of estimation in a form
more suited to post-processing. A post-processor which converts this
into a LaTeX table (called epp2tex) is shipped with
ols. See the example below.
EXAMPLES
You can get started without much fuss:
ols < data.1k
estimates a regression using data in the file 'data.1k', using
the default choice of l.h.s. and r.h.s. variable(s).
An example of integration with awk on input:
awk '{print "1 " $0}' data.text | ols -l 'cons x y'
Here, the file data.text contains observations which are (x, y)
points. By default ols
estimates regressions without an intercept. So we use
awk(1) to add in the vector of 1s. The -l is used
to describe the layout of the input lines: 1.0, x and y.
An example of using generated awk code:
ols -p data.est > a.awk
awk -f a.awk data.est > insample.predictions
awk -f a.awk data.osp > outofsample.predictions
Here, the -p flag is used to get ols to produce an awk
program which does prediction. This program is applied to the
estimation dataset data.est to get insample predictions, and to
a different file (data.osp) to get out-of-sample predictions.
The -epp flag is used for getting results in a form
suitable for post-processing. The program epp2tex.nawk is
shipped with ols. Thus you could say
ols -epp -l '_one x y z' -m 'z = _one x' datafile | nawk -f epp2tex.nawk > a.tex
Lookup the epp2tex documentation for details on switches recognised
by it.
NOTES
This is version 1.0.
The biggest dataset usable is somewhat smaller than swap space.
The amount of memory consumed is exactly what the dataset
calls for.
There are no hard limits on either the number of r.h.s. variables
or the number of observations.
BUGS
It doesn't know about missing data. Fixing this is not on the
cards.
If you do an estimation with seemingly straightforward data, and
get blatantly nonsensical estimates, then it's possible that
you have a linear dependence among the r.h.s. variables. Try
this dataset on ols for an example:
1 1 1 2
1 2 2 7
1 5 5 3
1 4 4 19
ols is very low on intelligence in sensing such degenerate
multicollinearity. This will be remedied in the next version.
AUTHOR
Ajay Shah, Rand Corporation, Santa Monica, CA
Ajay_Shah@rand.org
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- The Intercept
-
- OPTIONS
-
- EXAMPLES
-
- NOTES
-
- BUGS
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 03:07:11 GMT, February 10, 2022