The second part of the series on the Omitted Variable Bias intends to increase the readers understanding of the bias. Let’s continue with the example from the Introduction. Let the dependent variable be the price of a car and the explanatory variables be the car’s millage and the car’s age. In our case, both millage and age are important factors to that determine the price of a car.
The Venn Diagram below illustrates the problems that arise when we neglect an important variable from our regression analysis. Note that, the overlap of miles and price (area C) is the true impact of variable miles on price. The overlap of age and price (area D) is the true impact of variable age on price. Now, assume that you include millage in your regression analysis, but you omitted age. By doing so, you are estimating the impact of miles on price by areas C and B and not just area C. What can you say about the estimate of miles in the regression? What will be the consequences of neglecting age? Here are some general statements on what will happen if you neglect an important variable, in our case age.
- The coefficient of miles is biased because area B actually belongs to both variables miles and age.
- Since the coefficient of variable miles is estimated by both areas miles and age, its variance is reduced.
- Finally, the unexplained variance of price (the dependent variable) increases because you have omitted an important variable.
Omitted Variable Bias
- Understanding the Bias
- Explanation and Example
- What can we do about it?
- Concluding Remarks