Multicollinearity is a common problem in econometrics. As explained in a previous post, multicollinearity arises when we have too few observations to precisely estimate the effects of two or more highly correlated variables on the dependent variable. This post tries to graphically illustrate the problem of multicollinearity using venn-diagrams. The venn-diagrams below all represent the following regression model
Thereby, each circle depicts the variance of one variable of the regression model. That is, the circle depicts the variance of the dependent variable
, the circle
depicts the variance of variable
and the circle
shows the variance of the variable
. The overlapping areas show variation that variables have in common. For instance, the overlapping area of variable
and variable
represents the variation of variable
that can be explained by variable
.
In the first figure, the circles and
do both intersect with the circle
. However, there is no overlap between the circle
and the circle
. In this case, variable
and variable
are both correlated with variable
, but the two explanatory variables themselves are uncorrelated. Thus, one can precisely identify the effect of each explanatory variable (
and
) on the independent variable (
).

Figure 2 shows a case in which there exists some correlation between the two explanatory variables. Note that, in Figure 2 there exists some overlap between the circle and the circle
meaning that the two variables have some variation in common. You see that it becomes less clear to determine what the effect of one explanatory variable on the dependent variable actually is, i.e. there is some area overlapping all three variables. Although there exists some correlation between variable
and
, there is still enough variation left to determine the effect of
and
rather precisely.

Moderate multicollinearity is not much of a concern. However, if the correlation between two or more explanatory variables is very strong is get continuously harder to precisely estimate the pure effect of one explanatory variable on the dependent variable. Figure 3 depicts a case in which the variables and
are strongly correlated. There is increasingly less variation left that can be associated to only one explanatory variable and
. In this case we need more data to precisely estimate the effect of one explanatory variable on the dependent variable. Generally, multicollinearity lets our estimates become less accurate.

Finally, as already stated in this post, multicollinearity does not cause problems from a mathematical point of view as long as we do not have perfect multicollinearity. In the representation of a venn-diagram, perfect multicollinearity between variable and
would mean that the circle of variable
and the circle of variable
are identical, i.e. there exists a perfectly overlap between the two circles. Hence, one variable is a linear combination of the other one. There is no variation left to be estimated and the estimator breaks down as we violate the second assumption (full rank assumptions) of the Gauss-Markov assumptions.