# The Gauss Markov Theorem

When studying the classical linear regression model, one necessarily comes across the Gauss-Markov Theorem. The Gauss-Markov Theorem is a central theorem for linear regression models. It states different conditions that, when met, ensure that your estimator has the lowest variance among all unbiased estimators. More formally, the Gauss-Markov Theorem tells us that in a regression model, where the expected value of our error terms is zero, i.e. $E(\epsilon_{i}) = 0$, and the variance of the error terms is constant and finite, i.e. $\sigma^{2}(\epsilon_{i}) = \sigma^{2} \textless \infty$ and, $\epsilon_{i}$ and $\epsilon_{j}$ are uncorrelated for all $i$ and $j$ the least squares estimator $b_{0}$ and $b_{1}$ are unbiased and have minimum variance among all unbiased linear estimators. However, note that there might exist biased estimator that have a lower variance.

The reminder of the post summarizes the Gauss-Markov Theorem in a short, and hopefully intuitive way. However, if you are interested in a formal proof of the Gauss-Markov Theorem. You should check out this post here.

Suppose, we have the following regression model

$Y= b_{0} + b_{1} X$

Let’s start with shortly repeating how the point estimator for $b_{0}$ and $b_{1}$ looks like and how we can obtain the variance of the for the two coefficients. The point estimates, that is, the coefficients, for $b_{0}$ and $b_{1}$ can be obtained the following way:

(1) $b_{1}=\frac{\sum(\textbf{X}_{i}-\bar{\textbf{X}})(\textbf{Y}_{i}-\bar{\textbf{Y}})}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}} = \sum \textbf{k}_{i}\textbf{Y}_{i}, \textbf{k}_{i}=\frac{(\textbf{X}_{i}-\bar{\textbf{X}})}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}}$

(2) $b_{0}=\bar{Y}-b_{1}\bar{\textbf{X}}$

If you are not familiar with the linear regression model, you should first check out the following post. The exact derivation of the least squares estimator in matrix notation can be found here.

The variance of $b_{1}$ can be obtained the following way:

(3) $\sigma^{2}(b_{1})=\sigma^{2}(\sum \textbf{k}_{i}\textbf{Y}_{i})=\sum \textbf{k}_{i}^{2} \sigma^{2}(\textbf{Y}_{i})=\sigma^{2}(\frac{1}{\sum(\textbf{X}_{i}-\bar{\textbf{X}})^{2}})$

Now, the Gauss-Markov Theorem is telling us that if its conditions are met, then $b_{1}$ has the lowest variance among all unbiased linear estimators. That means, it has the lowest variance of all unbiased estimators of the following form

(4) $\hat{\beta}_{1} = \sum c_{i}Y_{i}$

Furthermore, remember that the Gauss-Markov Theorem is telling us that the estimator has the lowest variance among all unbiased estimators. Hence, the estimator must be unbiased. Given the assumption of unbiasedness we know that $E(\hat{\beta}_{1}) =\beta_{1}$. Let’s have a closer look:

(5) $E(\hat{\beta}_{1}) = \sum c_{i}E(Y_{i})$

(6) $E(\hat{\beta}_{1}) = \sum c_{i}E(\beta_{0}+\beta_{1}\textbf{X}_{i})$

(7) $E(\hat{\beta}_{1}) = \beta_{0} \sum c_{i} + \beta_{1} \sum c_{i} \textbf{X}_{i} = \beta_{1}$

(8) $E(\hat{\beta}_{1}) =\beta_{1}$

Note that, we can rewrite the definition of unbiasedness, i.e. $E(\hat{\beta}_{1}) =\beta_{1}$. If it holds that $E(\hat{\beta}_{1}) =\beta_{1}$ we know that $c_{i}$ must have certain properties/characteristics, i.e. it means that the assumption of unbiasedness imposes some restrictions on the $c_{i}$. So ultimately, the Gauss-Markov Theorem puts restictions on $c_{i}$.

In case you have not undestood everything, no worries. You can find the exact proof of the Gauss-Markov Theorem here.