In this post, we will discuss the consequence of the omitted variable bias in a more elaborate way. Particularly, we will show that omitting a variable form the regression model violates an OLS assumption and discuss what will happen if this assumption is violated.
In order to understand the consequences of the omitted variable bias, we first have to understand what is needed to obtain good estimates. When studying the linear regression models, you necessarily come across the Gauss-Markov theorem. This theorem states that if your regression model fulfills a set of assumptions (the assumptions of classical linear regression model), then you will obtain the best, linear, and unbiased estimates (BLUE ). One important assumption of this set of assumptions states that the error term of the regression model must be uncorrelated with the explanatory variables. However, as you will see in a minute, omitting a relevant variable introduces a correlation between the explanatory variables and the error term.
What happens when you omit an important variable? From the introductory post, you should know that one of the conditions for an omitted variable bias to exist is that the omitted variable is correlated with the independent variable and with at least one other explanatory variable. Now, when omitting a variable, it will show up in the residual, i.e. it will show up in the error term. Thus, the error term and independent variables are necessarily going to be correlated. This clearly violates the assumption that the error term and the independent variables must be uncorrelated. A violation of this assumption causes the OLS estimator to be biased and inconsistent.
Furthermore, when looking at the discussion using the Venn diagram, note that omitting a variable causes the unexplained variance of Y (the dependent variable) to increase as well as the variance of the estimated coefficient to decrease. This might lead to a situation in which you reject the null-hypothesis and believe that your coefficients are statistically significant at a given significance level although they are not.
How serious is the omitted variable bias..
The problem of the omitted variable bias is pretty serious. An omitted variable leads to biased and inconsistent coefficient estimate. And as we all know, biased and inconsistent estimates are not reliable. From our previous post, you might remember how omitting a variable can change the signs of the coefficients, depending on the correlation of the omitted variable with the independent and explanatory variables. Thus, coefficients also become unreliable. Hence, the regression model will fail completely.
… And what can be done about it?
To deal with an omitted variables bias is not easy. However, one can try several things.
First, one can try, if the required data is available, to include as many variables as you can in the regression model. Of course, this will have other possible implications that one has to consider carefully. First, you need to have a sufficient number of data points to include additional explanatory variables or else you will not be able to estimate your model. Second, depending on how many extra variables you include, the issues of including unnecessary variables may arise and start to seriously influence your estimates. However, additional explanatory variables can help to mitigate the problems associated with the omitted variable bias. That is, additional control variables can lower the bias.
Second, if you think that a variable is important and leaving it out of your regression model could cause an omitted variable bias, but at the same time you do not have data for it, you can look for proxies or find instrument variables for the omitted variables. For instance, in the car price example that we discussed earlier, the omitted variable was the age of the car. Suppose you do not have data on the age of the car, however you know how much time the last owner was in possession of the car, then the amount of time the car was owned by the last owner can be taken as a proxy for the age of a car. Note however, using proxies and instrumental variables comes with a whole set of additional assumptions and problems, most of them are quite complicated and not easily met.
Third, if you cannot resolve the omitted variable bias, you can try to make predictions in which direction your estimates are biased.
Omitted Variable Bias
- Understanding the Bias
- Explanation and Example
- Concluding Remarks