Graphically Illustrate Multicollinearity: Venn Diagram

Multicollinearity is a common problem in econometrics. As explained in a previous post, multicollinearity arises when we have too few observations to precisely estimate the effects of two or more highly correlated variables on the dependent variable. This post tries to graphically illustrate the problem of multicollinearity using venn-diagrams. The venn-diagrams below all represent the following regression model

y =x_1 + x_2 + \epsilon

Thereby, each circle depicts the variance of one variable of the regression model. That is, the circle y depicts the variance of the dependent variable y, the circle x_1 depicts the variance of variable x_1 and the circle x_2 shows the variance of the variable x_2. The overlapping areas show variation that variables have in common. For instance, the overlapping area of variable y and variable x_1 represents the variation of variable y that can be explained by variable x_1.

In the first figure, the circles x_1 and x_2 do both intersect with the circle y. However, there is no overlap between the circle x_1 and the circle x_2. In this case, variable x_1 and variable x_2 are both correlated with variable y, but the two explanatory variables themselves are uncorrelated. Thus, one can precisely identify the effect of each explanatory variable (x_1 and x_2) on the independent variable (y).

venn_multicollinearity_no
Figure 1: No Multicollinearity: Variable x_1 and x_2 are uncorrelated

Figure 2 shows a case in which there exists some correlation between the two explanatory variables. Note that, in Figure 2 there exists some overlap between the circle x_1 and the circle x_2 meaning that the two variables have some variation in common. You see that it becomes less clear to determine what the effect of one explanatory variable on the dependent variable actually is, i.e. there is some area overlapping all three variables. Although there exists some correlation between variable x_1 and x_2, there is still enough variation left to determine the effect of x_1 and x_2 rather precisely.

venn_multicollinearity_some
Figure 2: Moderate Multicollinearity: Variable x_1 and x_2 are somewhat correlated

Moderate multicollinearity is not much of a concern. However, if the correlation between two or more explanatory variables is very strong is get continuously harder to precisely estimate the pure effect of one explanatory variable on the dependent variable. Figure 3 depicts a case in which the variables x_1 and x_2 are strongly correlated. There is increasingly less variation left that can be associated to only one explanatory variable and y. In this case we need more data to precisely estimate the effect of one explanatory variable on the dependent variable. Generally, multicollinearity lets our estimates become less accurate.

venn_multicollinearity_strong
Figure 3: Strong Multicollinearity: Variable x_1 and x_2 are strongly correlated

Finally, as already stated in this post, multicollinearity does not cause problems from a mathematical point of view as long as we do not have perfect multicollinearity. In the representation of a venn-diagram, perfect multicollinearity between variable x_1 and x_2 would mean that the circle of variable x_1 and the circle of variable x_2 are identical, i.e. there exists a perfectly overlap between the two circles. Hence, one variable is a linear combination of the other one. There is no variation left to be estimated and the estimator breaks down as we violate the second assumption (full rank assumptions) of the Gauss-Markov assumptions.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.