The omitted variable bias is a common and serious problem in regression analysis. Generally, the problem arises if one does not consider all relevant variables in a regression. In this case, one violates the first assumption of the assumption of the classical linear regression model. In the introductory part of this series of posts on the omitted variable bias, you will learn what is exactly the omitted variable bias.
Let’s start with an example, suppose you want to examine what determines the price of second-hand cars. In order to find out, you decide to run a linear regression to estimate the price of used cars. You gather up all relevant factors that you think are relevant for determining the price of a car and include them in your linear regression model. You include variables such as the brand of the car, the number of seats that a car has, whether the car already had an accident or not, the amount of kilometer it was already driven, and the size of the car’s engine. However, you forgot to include an important variable – the age of the car. Thus, your regression is likely to give you biased estimates. That is, two cars with exactly the same values of the variables you have taken can have substantially different prices if the age of the car is different. In missing this important variable, your regression suffers from the omitted variable bias.
The omitted variable bias occurs because of a misspecification of the linear regression model. The problem can arise for various reasons, either because the effect of the omitted variable on the dependent variable is unknown or because a variable is simply not available. In the latter case, you might be forced to omit the relevant variable from your model. However, one needs to be aware that omitting a variable might lead to an over-estimation (upward bias) or under-estimation (downward bias) of the coefficient of one or more explanatory variables.
In order for the omitted variable to bias your coefficients, two requirements must be fulfilled:
- The omitted variable must be correlated with the dependent variable.
- The omitted variable must be correlated with one or more other explanatory variables.
In our example, the age of the car is negatively correlated with the price of the car and positively correlated with the cars milage. Hence, omitting the variable age in your regression results in an omitted variable bias.
Part three of the series on the omitted variable bias, intends to increase the readers understanding of the bias.