The Gauss Markov Theorem

When studying the classical linear regression model, one necessarily comes across the Gauss-Markov Theorem. The Gauss-Markov Theorem is a central theorem for linear regression models. It states different conditions that, when met, ensure that your estimator has the lowest variance among all unbiased estimators. More formally, the Gauss-Markov Theorem tells us that in a regression model, where the expected value of our error terms is zero, i.e. E(\epsilon_{i}) = 0, and the variance of the error terms is constant and finite, i.e. \sigma^{2}(\epsilon_{i}) = \sigma^{2} \textless \infty and, \epsilon_{i} and \epsilon_{j} are uncorrelated for all i and j the least squares estimator b_{0} and b_{1} are unbiased and have minimum variance among all unbiased linear estimators. However, note that there might exist biased estimator that have a lower variance.

The reminder of the post summarizes the Gauss-Markov Theorem in a short, and hopefully intuitive way. However, if you are interested in a formal proof of the Gauss-Markov Theorem. You should check out this post here.

Suppose, we have the following regression model

Y= b_{0} + b_{1} X

Let’s start with shortly repeating how the point estimator for b_{0} and b_{1} looks like and how we can obtain the variance of the for the two coefficients. The point estimates, that is, the coefficients, for b_{0} and b_{1} can be obtained the following way:

(1) b_{1}=\frac{\sum(\textbf{X}_{i}-\bar{\textbf{X}})(\textbf{Y}_{i}-\bar{\textbf{Y}})}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}} = \sum \textbf{k}_{i}\textbf{Y}_{i}, \textbf{k}_{i}=\frac{(\textbf{X}_{i}-\bar{\textbf{X}})}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}}

(2) b_{0}=\bar{Y}-b_{1}\bar{\textbf{X}}

If you are not familiar with the linear regression model, you should first check out the following post. The exact derivation of the least squares estimator in matrix notation can be found here.

The variance of b_{1} can be obtained the following way:

(3) \sigma^{2}(b_{1})=\sigma^{2}(\sum \textbf{k}_{i}\textbf{Y}_{i})=\sum \textbf{k}_{i}^{2} \sigma^{2}(\textbf{Y}_{i})=\sigma^{2}(\frac{1}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}})

Now, the Gauss-Markov Theorem is telling us that if its conditions are met, then b_{1} has the lowest variance among all unbiased linear estimators. That means, it has the lowest variance of all unbiased estimators of the following form

(4) \hat{\beta}_{1} = \sum c_{i}Y_{i}

Furthermore, remember that the Gauss-Markov Theorem is telling us that the estimator has the lowest variance among all unbiased estimators. Hence, the estimator must be unbiased. Given the assumption of unbiasedness we know that E(\hat{\beta}_{1}) =\beta_{1}. Let’s have a closer look:

(5) E(\hat{\beta}_{1}) = \sum c_{i}E(Y_{i})

(6) E(\hat{\beta}_{1}) = \sum c_{i}E(\beta_{0}+\beta_{1}\textbf{X}_{i})

(7) E(\hat{\beta}_{1}) = \beta_{0} \sum c_{i} + \beta_{1} \sum c_{i} \textbf{X}_{i} = \beta_{1}

(8) E(\hat{\beta}_{1}) =\beta_{1}

Note that, we can rewrite the definition of unbiasedness, i.e. E(\hat{\beta}_{1}) =\beta_{1}. If it holds that E(\hat{\beta}_{1}) =\beta_{1} we know that c_{i} must have certain properties/characteristics, i.e. it means that the assumption of unbiasedness imposes some restrictions on the c_{i}. So ultimately, the Gauss-Markov Theorem puts restictions on c_{i}.

In case you have not undestood everything, no worries. You can find the exact proof of the Gauss-Markov Theorem here.

10 thoughts on “The Gauss Markov Theorem”

    1. Thank you, glad you like it. It is quite some time I haven’t done anything here though, lets see, maybe I will be able to revive it 🙂

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.