LINEST

See Also

Calculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data, and returns an array that describes the line. Because this function returns an array of values, it must be entered as an array formula.

The equation for the line is:

y = mx + b

or

y = m1x1 + m2x2 + ... + b (if there are multiple ranges of x-values)

where the dependent y-value is a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that LINEST returns is {mn,mn-1,...,m1,b}. LINEST can also return additional regression statistics.

Syntax

LINEST(known_y's,known_x's,const,stats)

Known_y's   is the set of y-values you already know in the relationship y = mx + b.

Known_x's   is an optional set of x-values that you may already know in the relationship y = mx + b.

Const   is a logical value specifying whether to force the constant b to equal 0.

Stats   is a logical value specifying whether to return additional regression statistics.

The additional regression statistics are as follows.

Statistic Description
se1,se2,...,sen The standard error values for the coefficients m1,m2,...,mn.
Seb The standard error value for the constant b (seb = #N/A when const is FALSE).
r2 The coefficient of determination. Compares estimated and actual y-values, and ranges in value from 0 to 1. If it is 1, there is a perfect correlation in the sample ù there is no difference between the estimated y-value and the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y-value. For information about how r2 is calculated, see "Remarks" later in this topic.
sey The standard error for the y estimate.
F The F statistic, or the F-observed value. Use the F statistic to determine whether the observed relationship between the dependent and independent variables occurs by chance.
df The degrees of freedom. Use the degrees of freedom to help you find F-critical values in a statistical table. Compare the values you find in the table to the F statistic returned by LINEST to determine a confidence level for the model.
ssreg The regression sum of squares.
ssresid The residual sum of squares.

The following illustration shows the order in which the additional regression statistics are returned.

Worksheet

Remarks

Example 1   Slope and Y-Intercept

The example may be easier to understand if you copy it to a blank spreadsheet.

How?

  1. Create a blank spreadsheet.
  2. Select the example in the Help topic.

    Selecting example from Help

    Selecting an example from Help

  3. Press CTRL+C.
  4. In the spreadsheet, select cell A1, and press CTRL+V.
  5. To switch between viewing the formula that returns the result and the result in the cell, select the cell and press F2 and then ENTER, or click Commands and Options on the spreadsheet toolbar, click the Formula tab, and look in the Formula in active cell (active cell) box.
Known y Known x
1 0
9 4
5 2
7 3
Formula Formula
=LINEST(A2:A5,B2:B5,,FALSE)  

Note   The formula in the example must be entered as an array formula. After copying the example to a blank spreadsheet, select the range A7:B7 starting with the formula cell. Press F2, and then press CTRL+SHIFT+ENTER. If the formula is not entered as an array formula, the single result is 2.

When entered as an array, the slope (2) and the y-intercept (1) are returned.

Example 2   Simple Linear Regression

The example may be easier to understand if you copy it to a blank spreadsheet.

How?

  1. Create a blank spreadsheet.
  2. Select the example in the Help topic.

    Selecting example from Help

    Selecting an example from Help

  3. Press CTRL+C.
  4. In the spreadsheet, select cell A1, and press CTRL+V.
  5. To switch between viewing the formula that returns the result and the result in the cell, select the cell and press F2 and then ENTER, or click Commands and Options on the spreadsheet toolbar, click the Formula tab, and look in the Formula in active cell (active cell) box.
Month Sales
1 3100
2 4500
3 4400
4 5400
5 7500
6 8100
Formula Description (Result)
=SUM(LINEST(B2:B7, A2:A7)*{9,1}) Estimate sales for the ninth month (11000)

In general, SUM({m,b}*{x,1}) equals mx + b, the estimated y-value for a given x-value. You can also use the TREND function.

Note   The formula in the example must be entered as an array formula. After copying the example to a blank spreadsheet, select the formula cell (A9). Press F2, and then press CTRL+SHIFT+ENTER.

Example 3   Multiple Linear Regression

Suppose a commercial developer is considering purchasing a group of small office buildings in an established business district.

The developer can use multiple linear regression analysis to estimate the value of an office building in a given area based on the following variables.

Variable Refers to the
y Assessed value of the office building
x1 Floor space in square feet
x2 Number of offices
x3 Number of entrances
x4 Age of the office building in years

This example assumes that a straight-line relationship exists between each independent variable (x1, x2, x3, and x4) and the dependent variable (y), the value of office buildings in the area.

The developer randomly chooses a sample of 11 office buildings from a possible 1,500 office buildings and obtains the following data. "Half an entrance" means an entrance for deliveries only.

The example may be easier to understand if you copy it to a blank spreadsheet.

How?

  1. Create a blank spreadsheet.
  2. Select the example in the Help topic.

    Selecting example from Help

    Selecting an example from Help

  3. Press CTRL+C.
  4. In the spreadsheet, select cell A1, and press CTRL+V.
  5. To switch between viewing the formula that returns the result and the result in the cell, select the cell and press F2 and then ENTER, or click Commands and Options on the spreadsheet toolbar, click the Formula tab, and look in the Formula in active cell (active cell) box.
Floor space (x1) Offices (x2) Entrances (x3) Age (x4) Assessed value (y)
2310 2 2 20 142,000
2333 2 2 12 144,000
2356 3 1.5 33 151,000
2379 3 2 43 150,000
2402 2 3 53 139,000
2425 4 2 23 169,000
2448 2 1.5 99 126,000
2471 2 2 34 142,900
2494 3 3 23 163,000
2517 4 4 55 169,000
2540 2 3 22 149,000
Formula
=LINEST(E2:E12,A2:D12,TRUE,TRUE)

Note   The formula in the example must be entered as an array formula. After copying the example to a blank spreadsheet, select the range A14:E18 starting with the formula cell. Press F2, and then press CTRL+SHIFT+ENTER. If the formula is not entered as an array formula, the single result is -234.2371645.

When entered as an array, the following regression statistics are returned. Use this key to identify the statistic you want.

Worksheet

The multiple regression equation, y = m1*x1 + m2*x2 + m3*x3 + m4*x4 + b, can now be obtained using the values from row 14:

y = 27.64*x1 + 12,530*x2 + 2,553*x3+ 234.24*x4 + 52,318

The developer can now estimate the assessed value of an office building in the same area that has 2,500 square feet, three offices, and two entrances and is 25 years old, by using the following equation:

y = 27.64*2500 + 12530*3 + 2553*2 - 234.24*25 + 52318 = $158,261

Or you can copy the following table to cell A21 of the example workbook.

Floor space (x1) Offices (x2) Entrances (x3) Age (x4) Assessed value (y)
2500 3 2 25 =$D$14*$A$22 +$C$14*$B$22 + $B$14*$C$22 +$A$14*$D$22 +$E$14

You can also use the TREND function to calculate this value.

Example 4   Using The F And R2 Statistics

In the previous example, the coefficient of determination, or r2, is 0.99675 (see cell A17 in the output for LINEST), which would indicate a strong relationship between the independent variables and the sale price. You can use the F statistic to determine whether these results, with such a high r2 value, occurred by chance.

Assume for the moment that in fact there is no relationship among the variables, but that you have drawn a rare sample of 11 office buildings that causes the statistical analysis to demonstrate a strong relationship. The term "Alpha" is used for the probability of erroneously concluding that there is a relationship.

There is a relationship among the variables if the F-observed statistic is greater than the F-critical value. The F-critical value can be obtained by referring to a table of F-critical values in many statistics textbooks. To read the table, assume a single-tailed test, use an Alpha value of 0.05, and for the degrees of freedom (abbreviated in most tables as v1 and v2), use v1 = k = 4 and v2 = n - (k + 1) = 11 - (4 + 1) = 6, where k is the number of variables in the regression analysis and n is the number of data points. The F-critical value is 4.53.

The F-observed value is 459.753674 (cell A18), which is substantially greater than the F-critical value of 4.53. Therefore, the regression equation is useful in predicting the assessed value of office buildings in this area.

Example 5   Calculating The T-Statistics

Another hypothesis test will determine whether each slope coefficient is useful in estimating the assessed value of an office building in example 3. For example, to test the age coefficient for statistical significance, divide -234.24 (age slope coefficient) by 13.268 (the estimated standard error of age coefficients in cell A15). The following is the t-observed value:

t = m4 ≈ se4 = -234.24 ≈ 13.268 = -17.7

If you consult a table in a statistics manual, you will find that t-critical, single tail, with 6 degrees of freedom and Alpha = 0.05 is 1.94. Because the absolute value of t, 17.7, is greater than 1.94, age is an important variable when estimating the assessed value of an office building. Each of the other independent variables can be tested for statistical significance in a similar manner. The following are the t-observed values for each of the independent variables.

Variable t-observed value
Floor space 5.1
Number of offices 31.3
Number of entrances 4.8
Age 17.7

These values all have an absolute value greater than 1.94; therefore, all the variables used in the regression equation are useful in predicting the assessed value of office buildings in this area.