The Coefficient Of Determination or R2

The coefficient of determination R^{2} shows how much of the variation of the dependent variable y (Var(y)) can be explained by our model. Another way of interpreting the coefficient of determination R^{2}, which will not be discussed in this post, is to look at it as the squared Pearson correlation coefficient between the observed values y_{i} and the fitted values \hat{y}_{i}. Why this is the case exactly can be found in another post.

The coefficient of determination R^{2} is probably the most famous key figure when it comes to evaluate the fit of an ordinary least squares (OLS) estimation. The coefficient of determination R^{2} can be derived from a simple variance decomposition. Although it sounds complicated it is actually explained in few simple words. The first thing you have to remember is what OLS is actually all about. When using OLS you try to explain a certain dependent variable (y) through independent variables (x). Our model is thereby able to explain some variation of the dependent variable (y). We can summarize this as follows:

y_{i} = \hat{y}_{i} + e_{i}

where y_{i} is the dependent variable, which consists of a part we can explain \hat{y}_{i } (also now as fitted value) and a part we cannot explain e_{i} (also now as error term). From this decomposition follows that the variance of y (Var(y)) can be decomposed in a similar manner:

Var(y) =Var( \hat{y}) + Var(e) + 2Cov(\hat{y},e)

In the case our regression model (it usually does) contains a constant (usually depicted as \beta_{0} or \alpha_{0}) we know that Cov(\hat{y},e)=0. Given that Cov(\hat{y},e)=0 the equation above boils down to:

Var(y) =Var( \hat{y}) + Var(e)

Plugging in the formula of the Variance we get:

\frac{1}{N}\sum ( y_{i} - \bar{y})^{2}=\frac{1}{N} \sum ( \hat{y}_{i} - \bar{\hat{y}}) + \frac{1}{N} \sum (e_{i} - \bar{ e})

And as \bar{e}=0 is follows that

\sum ( y_{i} - \bar{y})^{2}=\sum ( \hat{y}_{i} - \bar{\hat{y}})^{2} + \sum (e^{2}_{i})

Where

\sum ( y_{i} - \bar{y})^{2} = TSS = Total Sum Squared

\sum ( \hat{y}_{i} - \bar{\hat{y}})^{2} = ESS = Explained Sum Squared

\sum (e^{2}_{i}) = SSR = Sum of Squared Residuals

The coefficient of determination R^{2} is consequently calculated as the ratio of Explained Sum Squared (ESS) to Total Sum Squared (TSS).

R^{2} = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS} = 1 - \frac{\sum (e^{2}_{i}) }{\sum ( y_{i} - \bar{y})^{2} }

What is left after having seen how the coefficient of determination R^{2} is calculated is to know what it actually expresses. Looking at the equation above should have already provided some idea what the coefficient of determination R^{2} says. It shows how much of the variation of the dependent variable y (Var(y)) can be explained by our model.

Another way of interpreting the coefficient of determination R^{2} is to look at it as the squared Pearson correlation coefficient between the observed values y_{i} and the fitted values \hat{y}_{i}. Why this is the case exactly can be found in another post.

7 thoughts on “The Coefficient Of Determination or R2”

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.