Linear Regression

A linear regression is a special case of the classical linear regression models that describes the relationship between two variables by fitting a linear equation to observed data. Thereby, one variable is considered to be the explanatory (or independent) variable, and the other variable is considered to be the dependent variable. For instance, an econometrician might want to relate weight to their calorie consumption using a linear regression model.

Before attempting to fit a linear model to observed data, an analyst should first examine whether or not there exists a meaningful relationship between the two variables. In our case one would reflect whether or not we expect a meaningful relationship between calorie consumption and weight. Probably one would assume that there should be some kind of relationship between calorie consumption and weight. Note however, linear regression is a correlation analysis and does not imply any causality. For instance, linear regression does not tell us that higher calorie consumption causes changes in weight, but rather that there exists some significant association between the two variables.

Before running a linear regression model, it might be meaningful to conduct a graphical evaluation of the data and to conduct some numerical summary statistics. For instance, a scatter-plot can be a useful tool in examining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (that is, the plot does not show any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. A valuable numerical measure of association between two variables is the correlation coefficient, which is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables.

Finally, the linear regression model is expressed as an equation of the form

Y=\alpha + \beta X

where X is the explanatory variable and Y is the dependent variable. The slope of the line is \beta, and \alpha is the intercept. Mathematically, a linear regression model fits a line to the data that minimizes the sum of squared deviations of the data from the line. This post provides a formal derivation of the classical regression model.

Various statistical software support linear regressions, including Julia, R and STATA. In case you interested to conduct a linear regression you can find some tutorial on this blog. This post explains how to conduct a linear regression in Julia, and the this post describes how one can conduct a linear regression in R.

3 thoughts on “Linear Regression”

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.