Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1

In the post that derives the least squares estimator, we make use of the following statement:

\frac{\partial b'X'Xb}{\partial b} =2X'Xb

This post shows how one can prove this statement. Let’s start from the statement that we want to prove:

\frac{\partial \hat{\beta}'X'X\hat{\beta}}{\partial \hat{\beta}}=2 X'X \hat{\beta}'

Note that X'X is symmetric. Hence, in order to simplify the math we are going to label X'X as A, i.e. X'X :=A.

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \hat{\beta}_{1} & \hat{\beta}_{2} & \hdots & \hat{\beta}_{k}\end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \hdots & a_{1k}\\ a_{21} & a_{22} & \hdots & a_{2k}\\ \vdots & \vdots & \ddots & \vdots \\ a_{k1} & a_{k2} & \hdots & a_{kk} \end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\\ \hat{\beta_{2}} \\ \vdots \\ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{bmatrix} \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i1} & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{i2} & \hdots & \sum\limits_{i=1}^k \hat{\beta_{i}}a_{ik}\end{bmatrix} \begin{bmatrix} \hat{\beta_{1}}\\ \hat{\beta_{2}} \\ \vdots \\ \hat{\beta_{k}} \end{bmatrix}

\hat{\beta}'A\hat{\beta}= \begin{matrix} \hat{\beta}^{2}_{1}a_{11}+\hat{\beta}_{1}\hat{\beta}_{2}a_{21}+\hdots+\hat{\beta}_{1}\hat{\beta}_{k}a_{k1}+\\ \hat{\beta}_{2}\hat{\beta}_{1}a_{21}+\hat{\beta}_{2}^{2}a_{22}+\hdots+\hat{\beta}_{2}\hat{\beta}_{k}a_{k2}+\\ \vdots \\ \hat{\beta}_{k}\hat{\beta}_{1}a_{k1}+\hat{\beta}_{k}\hat{\beta}_{2}a_{k2}+\hdots+\hat{\beta}_{k}^{2}a_{kk}\\ \end{matrix}

Let’s compute the partial derivative of \hat{\beta}'A\hat{\beta} with respect to \hat{\beta}.

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2\hat{\beta}_{1}a_{11}+\hat{\beta}_{2}a_{21}+\hdots+\hat{\beta}_{k}a_{k1}+\hat{\beta}_{2}a_{12}+\hdots+\hat{\beta}_{2}a_{2k}++\hdots+\hat{\beta}_{k}a_{1k}+\hdots+\hat{\beta}_{k}a_{kk}

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}= 2(\hat{\beta}_{1}a_{11}+\hat{\beta}_{2}a_{12}+\hdots+\hat{\beta}_{k}a_{1k})

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{2}}= 2(\hat{\beta}_{1}a_{21}+\hat{\beta}_{2}a_{22}+\hdots+\hat{\beta}_{k}a_{2k})

\vdots

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{k}}= 2(\hat{\beta}_{1}a_{k1}+\hat{\beta}_{2}a_{k2}+\hdots+\hat{\beta}_{k}a_{kk})

Instead of stating every single equation, one can state the same using the more compact matrix notation:

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2A\hat{\beta}

plugging in X'X for A

\frac{\partial \hat{\beta}'A\hat{\beta}}{\partial \hat{\beta}_{1}}=2X'X\hat{\beta}

Now let’s return to the derivation of the least squares estimator.

6 thoughts on “Derivation of the Least Squares Estimator for Beta in Matrix Notation – Proof Nr. 1”

  1. I think there is a tiny error in the pre-last line. The right hand side should be 2A_1 times B as we use only the first row of the matrix A. Then it is generalized to a vector of partial derivatives on the left and matrix A times B on the right. Nevertheless the proof was very helpful, thank you for posting it!

  2. Hi, thanks for the proof, I appreciate it. I just want to point out to a typo. When writing \hat{\beta}^\prime A \hat{\beta} as a number (i.e. sum of sums), two errors occur at the last row:

    1) The index of very first beta in the row should be k, not 1.
    2) The plus sign at the end of the row is redundant

  3. In the line after “Let’s compute the partial derivative” at the end you have B_2 a_12 + …. + B_2 a_1k, but I think it should be B_2 a_12 + …. + B_k a_1k.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.