Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1

In the post that derives the least squares estimator, we make use of the following statement:

\frac{\partial b'X'Xb}{\partial b} =2X'Xb

This post shows how one can prove this statement. Let’s start from the statement that we want to prove:

\frac{\partial \hat{\beta}'X'X\hat{\beta}}{\partial \hat{\beta}}=2 X'X \hat{\beta}'

Note that X'X is symmetric. Hence, in order to simplify the math we are going to label X'X as A, i.e. X'X :=A.

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \hat{\beta}_{1} & \hat{\beta}_{2} & \hdots & \hat{\beta}_{k}\end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \hdots & a_{1k}\\ a_{21} & a_{22} & \hdots & a_{2k}\\ \vdots & \vdots & \ddots & \vdots \\ a_{k1} & a_{k2} & \hdots & a_{kk} \end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\\ \hat{\beta_{2}} \\ \vdots \\ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i1} & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i2} & \hdots & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{ik}\end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\\ \hat{\beta_{2}} \\ \vdots \\ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{matrix} \hat{\beta}^{2}_{1}a_{11}+\hat{\beta}_{1}\hat{\beta}_{2}a_{21}+\hdots+\hat{\beta}_{1}\hat{\beta}_{k}a_{k1}+\\ \hat{\beta}_{2}\hat{\beta}_{1}a_{21}+\hat{\beta}_{2}^{2}a_{22}+\hdots+\hat{\beta}_{2}\hat{\beta}_{k}a_{2k}+\\ \vdots \\ \hat{\beta}_{k}\hat{\beta}_{1}a_{k1}+\hat{\beta}_{k}\hat{\beta}_{2}a_{k2}+\hdots+\hat{\beta}_{k}^{2}a_{kk}\\ \end{matrix}

Let’s compute the partial derivative of \hat{\beta}'A\hat{\beta} with respect to \hat{\beta}.

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2\hat{\beta}_{1}a_{11}+\hat{\beta}_{2}a_{21}+\hdots+\hat{\beta}_{k}a_{k1}+\hat{\beta}_{2}a_{12}+\hdots+\hat{\beta}_{2}a_{1k}

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}= 2(\hat{\beta}_{1}a_{11}+\hat{\beta}_{2}a_{12}+\hdots+\hat{\beta}_{k}a_{1k})

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{2}}= 2(\hat{\beta}_{1}a_{21}+\hat{\beta}_{2}a_{22}+\hdots+\hat{\beta}_{k}a_{2k})


\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{k}}= 2(\hat{\beta}_{1}a_{k1}+\hat{\beta}_{2}a_{k2}+\hdots+\hat{\beta}_{k}a_{kk})

Instead of stating every single equation, one can state the same using the more compact matrix notation:

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2A\hat{\beta}

plugging in X'X for A

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2X'X\hat{\beta}

Now let’s return to the derivation of the least squares estimator.

4 thoughts on “Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1”

  1. I think there is a tiny error in the pre-last line. The right hand side should be 2A_1 times B as we use only the first row of the matrix A. Then it is generalized to a vector of partial derivatives on the left and matrix A times B on the right. Nevertheless the proof was very helpful, thank you for posting it!

  2. Hi, thanks for the proof, I appreciate it. I just want to point out to a typo. When writing \hat{\beta}^\prime A \hat{\beta} as a number (i.e. sum of sums), two errors occur at the last row:

    1) The index of very first beta in the row should be k, not 1.
    2) The plus sign at the end of the row is redundant

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.