In this post, we will discuss the consequence of the omitted variable bias in a more elaborate way. Particularly, we will show that omitting a variable form the regression model violates an OLS assumption and discuss what will happen if this assumption is violated.
In order to understand the consequences of the omitted variable bias, we first have to understand what is needed to obtain good estimates. When studying the linear regression models, you necessarily come across the Gauss-Markov theorem. This theorem states that if your regression model fulfills a set of assumptions (the assumptions of classical linear regression model), then you will obtain the best, linear, and unbiased estimates (BLUE ). One important assumption of this set of assumptions states that the error term of the regression model must be uncorrelated with the explanatory variables. However, as you will see in a minute, omitting a relevant variable introduces a correlation between the explanatory variables and the error term.
What happens when you omit an important variable? From the introductory post, you should know that one of the conditions for an omitted variable bias to exist is that the omitted variable is correlated with the independent variable and with at least one other explanatory variable. Now, when omitting a variable, it will show up in the residual, i.e. it will show up in the error term. Thus, the error term and independent variables are necessarily going to be correlated. This clearly violates the assumption that the error term and the independent variables must be uncorrelated. A violation of this assumption causes the OLS estimator to be biased and inconsistent. For a mathematical proof of this statement see this post.
Furthermore, when looking at the discussion using the Venn diagram, note that omitting a variable causes the unexplained variance of Y (the dependent variable) to increase as well as the variance of the estimated coefficient to decrease. This might lead to a situation in which you reject the null-hypothesis and believe that your coefficients are statistically significant at a given significance level although they are not.
How serious is the omitted variable bias..
The problem of the omitted variable bias is pretty serious. An omitted variable leads to biased and inconsistent coefficient estimate. And as we all know, biased and inconsistent estimates are not reliable. From our previous post, you might remember how omitting a variable can change the signs of the coefficients, depending on the correlation of the omitted variable with the independent and explanatory variables. Thus, coefficients also become unreliable. Hence, the regression model will fail completely. In a simple simulation exercise, I tried to visualize what happens if we neglect a relevant variable from a regression models. The exercise confirms that when neglecting a relevant variable from the model, OLS fails to estimate the coefficients correctly.
… And what can be done about it?
To deal with an omitted variables bias is not easy. Check out this post to read what one might try to tackle the issues associated to the omitted variable bias.
Omitted Variable Bias
- Understanding the Bias
- Explanation and Example
- What can we do about it?
- Concluding Remarks