In the previous two posts on the Omitted Variable Bias (Post 1 and Post 2), we discussed the hypothetical case of finding out what determines the price of a car. In the hypothetical example, we assumed, for simplicity, that the price of a car depends only on the age of a car and its milage. In this post, we discuss the effects of the omitted variable bias on single coefficients. In order to do so, suppose that you want to find out what is the effect of miles on the price a car.
Let the data generating process be as follows:
This implies that the price of a car is determined by its milage and its age. However, for whatever reason you omit the variable age in your regression model and you estimate the following reduced regression model:
What are the effects of omitting the variable age? What will happen to the coefficient of miles? What sign do you expect for ?
Let’s elaborate on these questions. A priori, one would expect that a higher milage lowers the price of a car. Hence, we would expect to have a negative sign, i.e. . One would further expect that an older car is cheaper and hence traded at a lower price. Also, one would expect that an older car has more miles. We can therefore conclude that,
- Price and miles are negatively correlated
- Miles and age are positively correlated.
What does this imply for our regression analysis? We know now that a large number of miles lowers the price of a car. But, if a car has many miles it tends to be older. Thus, when omitting the variable age, the variable miles may actually be accounting also for the effects of age and not only miles.
Thus, , suffers from a bias.
But can we say something more about the bias? Yes. We know that suffers from a downward bias. This is because both age and miles have a negative effect on the price. Leaving out age lets the coefficient of miles pick up parts of the negative effects of age.
Hence, it follows that the true . This implies that if then it is not necessarily true that .
The illustration below summarizes the direction of the omitted variable bias. Let Y be the dependent variable, A and B the independent variables, and B the omitted variable.