Relationship between Coefficient of Determination & Squared Pearson Correlation Coefficient

The usual way of interpreting the coefficient of determination $R^{2}$ is to see it as the percentage of the variation of the dependent variable $y$ ( $Var(y)$ ) can be explained by our model. The exact interpretation and derivation of the coefficient of determination $R^{2}$ can be found here.

Another way of interpreting the coefficient of determination $R^{2}$ is to look at it as the Squared Pearson Correlation Coefficient between the observed values $y_{i}$ and the fitted values $\hat{y}_{i}$ . In this post we are going to prove that this is actually the case. For the proof we have to know the following (taken from OLS theory and general statistics):

$y = \hat{y} + e$
$Cov[\hat{y},e]=0$
$Cov[x,(y+Z)]=Cov(x,y)+Cov(x,Z)$
$Var(x) = Cov(x,x)$
$Var(x) = \frac{1}{n} \sum_i^N (x_i - \bar{x})^2$
$r_{y,\hat{y}}=\frac{Cov(y,\hat{y})}{\sqrt[2]{Var(y)Var(\hat{y}) }}$

In the following we are going to see how to derive the coefficient of determination $R^{2}$ from the the Squared Pearson Correlation Coefficient between the observed values $y_{i}$ and the fitted values $\hat{y}_{i}$ .

$r^{2}_{y,\hat{y}}=\left(\frac{Cov(y,\hat{y})}{\sqrt[2]{Var(y)Var(\hat{y}) }}\right)^{2}$

$r^{2}_{y,\hat{y}}=\frac{Cov(y,\hat{y})}{\sqrt[2]{Var(y)Var(\hat{y}) }} \frac{Cov(y,\hat{y})}{\sqrt[2]{Var(y)Var(\hat{y}) }}$

$r^{2}_{y,\hat{y}}=\frac{Cov(y,\hat{y}) Cov(y,\hat{y})}{Var(y)Var(\hat{y}) }$

$r^{2}_{y,\hat{y}}=\frac{Cov(\hat{y}+e,\hat{y}) Cov(\hat{y}+e,\hat{y})}{Var(y)Var(\hat{y}) }$

$r^{2}_{y,\hat{y}}=\frac{\left(Cov(\hat{y},\hat{y})+ Cov(\hat{y},e) \right) \left(Cov(\hat{y},\hat{y})+ Cov(\hat{y},e) \right) }{Var(y)Var(\hat{y}) }$

$r^{2}_{y,\hat{y}}=\frac{Cov(\hat{y},\hat{y})Cov(\hat{y},\hat{y})}{Var(y)Var(\hat{y}) }$

$r^{2}_{y,\hat{y}}=\frac{Var(\hat{y}) Var(\hat{y})}{Var(y)Var(\hat{y}) }$

$r^{2}_{y,\hat{y}}=\frac{Var(\hat{y}) }{Var(y) }= \frac{\frac{1}{n} \sum_i^N (\hat{y_i} - \bar{\hat{y}})^2}{\frac{1}{n} \sum_i^N (y_i - \bar{y})^2} = \frac{\sum_i^N (\hat{y_i} - \bar{\hat{y}})^2}{\sum_i^N (y_i - \bar{y})^2} = \frac{ESS}{TSS} = R^{2}$

$r^{2}_{y,\hat{y}}= R^{2}$

20 thoughts on “Relationship between Coefficient of Determination & Squared Pearson Correlation Coefficient”

Pingback: The Coefficient Of Determination or R2 | Economic Theory Blog

Thank you for this!

isidorebeautrelet says:

February 14, 2015 at 8:41 am

You are very welcome!

Reply

Hi Isidore: Do you know if the relation between the correlation coefficient R and r holds for the regression model with ma(1) errors ? empirically I seem to find that it doesn’t hold. but I wanted to make sure that my code didn’t have a bug. thanks for any wisdom and your blog.

isidorebeautrelet says:

May 2, 2015 at 9:24 am

Hi Mark! That is actually a very good question which unfortunately I cannot answer out of the box. However, once I have some time I will look into it. So far my feeling is that the second of the five bullet points (listed in the post), i.e. the covariance between the fitted values and the error term being equal to zero, is most likely violated. Generally I think if you are able to show that all five bullet points hold for a ma(1) process, the relationship between r2 and the correlation coefficient should hold as well.

Let me know if you find an answer to the question.
Cheers!

Reply

Thanks Isidore: What you pointf out is equivalent to the sums of squares decomposition relation , SSETOT = SSREG + SSE, being true. So I think I should look for info on when that decomposition holds in general. cov(y hat, e ) not being zero makes it not true so your pont is a good one. Thanks and I’ll let you know if I find anything out about it.

How did you get from covar(x,y) to covar(y’,y)?

ad says:

September 19, 2016 at 10:34 am

Thank you for your comment. I am sorry, but I cannot really help you as I do not understand to which equation you are referring to. If you could be more specific I might be able to help.
Regards

Reply

Hello, can you tell me what you do between the pictures, i don’t quite understand it

ad says:

December 9, 2016 at 3:38 pm

Hello, what do you mean by pictures?

Reply

Thanks a lot for this. Maybe, one possible small typo is: ESS/TSS should be RSS/TSS? This is true as the mean of y^hat is equal to the mean of y, as the mean of e_i is zero.

ad says:

March 18, 2018 at 7:52 am

Thank you for your comment. EES stands for “Explained Sum of Squares”, whereas RSS stands for the “Residual Sum of Squares”. Hence ESS/TSS is correct.

Best, ad

Reply

o sorry, I guess I was wrong as ESS is as “Explained SS” .

ad says:

March 18, 2018 at 7:51 am

Exactly, ESS stands for “Explained Sum of Squares”. Best ad

Reply

Very interesting content – thanks!

For some reason I fail to see the intuition behind the second last line. Why can var(y^) be said to be equal to ESS?
Does this have to do with the assumptions behind the y^ line? Coulnd’t there be variance in this regression line which doesn’t contribute to explaining the variance in y?

I hope my question makes sense

ad says:

August 26, 2018 at 5:36 am

Hi, this is a very good question. I was to short on this point. I will adjust the post such that it becomes more clear. The short answer is, plug in the variance equation two times and $1/n$ cancels out. What remains are the sum of squared residuals.

Thanks for this comment.
ad

Reply

Thank you. I think there is some missing squares on the second last line, no? var(y) = sum((yi-mean(y))^2)/n, and the same thing for the estimates.

ad says:

December 17, 2018 at 10:41 am

Thank you for you comment. You are right, of course! I adjusted the post. Cheers, ad

Reply

Pingback: Linear Regression, $mathrm{Cov}(hat{y},e)=0$, correct Argument? – GrindSkills

Pingback: Playing with stats – Keep it real

Economic Theory Blog

Relationship between Coefficient of Determination & Squared Pearson Correlation Coefficient

20 thoughts on “Relationship between Coefficient of Determination & Squared Pearson Correlation Coefficient”

Leave a comment Cancel reply

“In God we trust; all others must bring data.” W. Edwards Deming

Share this:

20 thoughts on “Relationship between Coefficient of Determination & Squared Pearson Correlation Coefficient”

Leave a comment Cancel reply

“In God we trust; all others must bring data.” W. Edwards Deming