In the post that derives the least squares estimator, we make use of the following statement:

This post shows how one can prove this statement. Let’s start from the statement that we want to prove:

Note that is symmetric. Hence, in order to simplify the math we are going to label as A, i.e. .

Let’s compute the partial derivative of with respect to .

Instead of stating every single equation, one can state the same using the more compact matrix notation:

plugging in for A

Now let’s return to the derivation of the least squares estimator.

### Like this:

Like Loading...

I think there is a tiny error in the pre-last line. The right hand side should be 2A_1 times B as we use only the first row of the matrix A. Then it is generalized to a vector of partial derivatives on the left and matrix A times B on the right. Nevertheless the proof was very helpful, thank you for posting it!

Hi, thanks for the proof, I appreciate it. I just want to point out to a typo. When writing \hat{\beta}^\prime A \hat{\beta} as a number (i.e. sum of sums), two errors occur at the last row:

1) The index of very first beta in the row should be k, not 1.

2) The plus sign at the end of the row is redundant

Thanks Adam, you are right! I corrected the mistakes. Thanks a lot! Cheers, ad.

In the line after “Let’s compute the partial derivative” at the end you have B_2 a_12 + …. + B_2 a_1k, but I think it should be B_2 a_12 + …. + B_k a_1k.

Thanks!