Proof of Unbiasedness of Sample Variance Estimator

Proof of Unbiasness of Sample Variance Estimator

(As I received some remarks about the unnecessary length of this proof, I provide shorter version here)

In different application of statistics or econometrics but also in many other examples it is necessary to estimate the variance of a sample. The estimator of the variance, see equation (1) is normally common knowledge and most people simple apply it without any further concern. The question which arose for me was why do we actually divide by n-1 and not simply by n? In the following lines we are going to see the proof that the sample variance estimator is indeed unbiased.

s^2 = variance of the sample

x_i = manifestations of random variable X with i from 1 to n

\bar x = sample average

\mu = mean of the population

\delta = population variance

(1) s^2=\frac{1}{n-1}\sum\limits_{i=1}^n(x_i-\bar x)^2

First step of the proof

(2) x_i-\bar x = x_i - \mu + \mu - \bar x

(3) x_i-\bar x =( x_i - \mu) + (\mu - \bar x)

(4) (x_i-\bar x)^2 = [(x_i - \mu) + (\mu - \bar x)]^2

(5) (x_i-\bar x)^2 = (x_i - \mu)(x_i - \mu)+(x_i - \mu)(\mu - \bar x)+(\mu - \bar x)(x_i - \mu)+(\mu - \bar x)(\mu - \bar x)

(6) (x_i-\bar x)^2 = (x_i - \mu)^2+2*(x_i - \mu)(\mu - \bar x)+(\mu - \bar x)^2

(7) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+2*\sum\limits_{i=1}^n(x_i - \mu)(\mu - \bar x)+\sum\limits_{i=1}^n(\mu - \bar x)^2

(8) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+2(\mu - \bar x)\sum\limits_{i=1}^n(x_i - \mu)+n(\mu - \bar x)^2

(9) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2(\mu - \bar x)\sum\limits_{i=1}^n(x_i - \mu)

(10) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2(\mu - \bar x)(\sum\limits_{i=1}^n x_i - \sum\limits_{i=1}^n \mu)

(11) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2(\mu - \bar x)(\sum\limits_{i=1}^n x_i - n\mu)

(12) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2(\mu - \bar x)(n\bar x - n\mu) as n\bar x=\sum\limits_{i=1}^n x_i which derives from \bar x=\frac{\sum\limits_{i=1}^n x_i}{n}

(13) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2n(\mu - \bar x)(\bar x - \mu)

(14) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2+2n(-1)(-\mu + \bar x)(\bar x - \mu)

(15) \sum\limits_{i=1}^n (x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2-2n(\bar x-\mu )(\bar x - \mu)

(16) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\mu - \bar x)^2-2n(\bar x-\mu )^2

(17) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2+n(\bar x-\mu)^2-2n(\bar x-\mu )^2

which can be done as it does not change anything at the result

(18) \sum\limits_{i=1}^n(x_i-\bar x)^2 = \sum\limits_{i=1}^n(x_i - \mu)^2-n(\bar x-\mu )^2

Second step of the proof

(19)E(X)=\sum\limits_{i=1}^n x_i f(x_i) if x is i.u.d. (identically uniformely distributed) and if f(x_i)=\frac{1}{n} then

(21)E(X)=\frac{1}{n-1}\sum\limits_{i=1}^n x_i

(22)E(S^2)=E(\frac{1}{n -1 }\sum\limits_{i=1}^n(x_i - \mu)^2-n(\bar x-\mu )^2)

(23)E(S^2)=\frac{1}{n}[E(\sum\limits_{i=1}^n (x_i - \mu)^2)-n E(\bar x-\mu )^2]

Third step of the proof

(24)E(X+Y)=\sum\limits_{i=1}^n\sum\limits_{j=1}^n (x_i+y_j)f(x_iy_j)

(25)E(X+Y)=\sum\limits_{i=1}^n\sum\limits_{j=1}^n x_if(x_iy_j)+\sum\limits_{i=1}^n\sum\limits_{j=1}^n y_jf(x_iy_j)

(26)E(X+Y)=\sum\limits_{i=1}^n x_if(x_i)\sum\limits_{j=1}^n f(y_j)+\sum\limits_{j=1}^n y_jf(y_j)\sum\limits_{i=1}^n f(x_i)

(27)E(X+Y)=\sum\limits_{i=1}^n x_if(x_i)+\sum\limits_{j=1}^n y_jf(y_j)\sum\limits_{i=1}^n f(x_i) as\sum\limits_{j=1}^n f(y_j)=1 and\sum\limits_{i=1}^n f(x_i)=1

(28)E(X+Y)=E(X)+E(Y)

Applying this on our original function:

(29)E(g(x_i))=\sum\limits_{i=1}^n g(x_i)f(x_i)

(30)E(g(x_i))=\sum\limits_{i=1}^n (x_i - \mu)^2f(x_i)

(31)E(g(x_i))=\sum\limits_{i=1}^n (x_i - \mu)^2\frac{1}{n} asf(x_i)=\frac{1}{n}

(32)E(g(x_i))=\sum\limits_{i=1}^n \frac{1}{n}(x_i - \mu)^2=\sum\limits_{i=1}^n Var(x_i)

(33)E(g(x_i))=\sum\limits_{i=1}^nVar(x_i)

(34)E(g(x_i))=nVar(x_i)

Plugging (34) into (23) gives us:

(35)E(S^2)=\frac{1}{n-1}[nVar(x_i)-nE(\bar x-\mu )^2]

Notice also that:

(36)Var(\bar x)=Var(\frac{\sum\limits_{i=1}^n x_i}{n})=Var(\frac{1}{n}\sum\limits_{i=1}^n x_i)

and that:

(37)Var(a+bx_i)

(38)\mu_y=a+b\mu_x

(39)y_i=a+bx_i

using (38) and (39) in (37) we get:

(40)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n (y_i-\mu_y)^2

(41)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n (a+bx_i-(a+b\mu_x)^2

(42)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n(a+bx_i-a-b\mu_x)^2

(43)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n(bx_i-b\mu_x)^2

(44)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n(b(x_i-\mu_x))^2

(45)Var(a+bx_i)=\frac{1}{n}\sum\limits_{i=1}^n(b^2(x_i-\mu_x)^2)

(46)Var(a+bx_i)=b^2\frac{1}{n}\sum\limits_{i=1}^n(x_i-\mu_x)^2

(47)Var(a+bx_i)=b^2Var(x_i)

knowing (40)-(47) let us return to (36) and we see that:

(48)Var(\bar x)=Var(\frac{\sum\limits_{i=1}^n x_i}{n})=Var(\frac{1}{n}\sum\limits_{i=1}^n x_i)

(49)Var(\bar x)=Var(\frac{1}{n}\sum\limits_{i=1}^n x_i)

(50)Var(\bar x)=\frac{1}{n^2}Var(\sum\limits_{i=1}^n x_i)

(51)Var(\bar x)=\frac{1}{n^2}\sum\limits_{i=1}^n Var(x_i)

(52)Var(\bar x)=\frac{1}{n^2}\sum\limits_{i=1}^n(\frac{1}{n}\sum\limits_{i=1}^n(x_i-\mu_x)^2)

just looking at the last part of (51) were we have Var(x_i)=\sum\limits_{i=1}^n (x_i-\mu_x)^2 we can apply simple computation rules of variance calulation:

(53)Var(x_i)=\sum\limits_{i=1}^n(x_i-\mu_x)^2

(54)Var(X+Y)=\frac{1}{n}\sum\limits_{i=1}^n((X+Y)-(\mu_x+\mu_y))^2

now the x_i on the lhs of (53) corresponds to the (X+Y) of the rhs of (54) and \mu of the rhs of (53) corresponds to \mu_x+\mu_y of the rhs of (54). Now what exactly do we mean by that, well

(55)Var(X+Y)=E[(X+Y)-(\mu_x+\mu_y)]^2

(56)Var(X+Y)=E[(X-\mu_x)+(Y-\mu_y)]^2

(57)Var(X+Y)=E[(X-\mu_x)^2+(Y-\mu_y)^2+2(X-\mu_x)(Y-\mu_y)]

(58)Var(X+Y)=Var(X)+Var(Y)+2(X-\mu_x)(Y-\mu_y)]

the term 2(X-\mu_x)(Y-\mu_y) is the covariance of X and Y and is zero, as X is independent of Y. This leaves us with the variance of X and the variance of Y. From (52) we know that

(59)Var(\bar x)=\frac{1}{n^2}\sum\limits_{i=1}^n (\frac{1}{n}\sum\limits_{i=1}^n (x_i-\mu_x)^2)

and playing around with it brings us to the following:

(60)Var(\bar x)=\frac{1}{n^2}\sum\limits_{i=1}^n Var(X)

(61)Var(\bar x)=\frac{1}{n^2}n Var(X)

(62)Var(\bar x)=\frac{1}{n}Var(X)

(63)Var(\bar x)=\frac{1}{n}\delta^2

now we have everything to finalize the proof. Return to equation (23)

(64)E(S^2)=\frac{1}{n-1}[E(\sum\limits_{i=1}^n(x_i - \mu)^2)-nE(\bar x-\mu )^2]

and we see that we have:

(65)E(S^2)=\frac{1}{n-1}[E(\sum\limits_{i=1}^n Var(x_i))-n\frac{1}{n}Var(X)]

(66)E(S^2)=\frac{1}{n-1}[E(\sum\limits_{i=1}^n \delta^2)-n\frac{1}{n}\delta^2]

(67)E(S^2)=\frac{1}{n-1}[n\delta^2-n\frac{1}{n}\delta^2]

(68)E(S^2)=\frac{1}{n-1}[n\delta^2-\delta^2]

so we are able to factorize and we end up with:

(69)E(S^2)=\frac{1}{n-1}[\delta^2(n-1)]

which cancels out and it follows that

(70)E(S^2)=\delta^2

Sometimes I may have jumped over some steps and it could be that they are not as clear for everyone as they are for me, so in the case it is not possible to follow my reasoning just leave a comment and I will try to describe it better.

As most comments and remarks are not about missing steps, but demand a more compact version of the proof, I felt obliged to provide one here.

41 thoughts on “Proof of Unbiasedness of Sample Variance Estimator”

  1. In your step (1) you use n as if it is both a constant (the size of the sample) and also the variable used in the sum (ranging from 1 to N, which is undefined but I guess is the population size). Shouldn’t the variable in the sum be i, and shouldn’t you be summing from i=1 to i=n? This makes it difficult to follow the rest of your argument, as I cannot tell in some steps whether you are referring to the sample or to the population.

    1. You are right, I’ve never noticed the mistake. It should clearly be i=1 and not n=1. And you are also right when saying that N is not defined, but as you said it is the sample size. I will add it to the definition of variables.

      However, you should still be able to follow the argument, if there any further misunderstandings, please let me know.

  2. please how do we show the proving of V( y bar subscript st) = summation W square subscript K x S square x ( 1- f subscript n) / n subscript k …..please I need ur assistant

    1. Unfortunately I do not really understand your question. Is your formula taken from the proof outlined above? Do you want to prove that the estimator for the sample variance is unbiased? Or do you want to prove something else and are asking me to help you with that proof? In any case, I need some more information 🙂

  3. I am very glad with this proven .how can we calculate for estimate of average size
    and whats the formula. including some example thank you

  4. please how can I prove …v(Y bar ) = S square /n(1-f)
    and

    S subscript = S /root n x square root of N-n /N-1
    and

    S square = summation (y subscript – Y bar )square / N-1

  5. I like it….

    please help me to check this sampling techniques

    an investigator want to know the adequacy of working condition of the employees of a plastic production factory whose total working population is 5000. if the junior staff is 4 times the intermediate staff working population and the senior staff constitute 15% of the working population .if further ,male constitute 75% ,50% and 80% of junior , intermediate and senior staff respectively of the working population .draw a stratified sample sizes in a table ( taking cognizance of the sex and cadres ).

    I am confused about it please help me out thanx

  6. Gud day sir, thanks alot for the write-up because it clears some of my confusion but i am stil having problem with 2(x-u_x)+(y-u_y), how it becomes zero. Pls explan to me more.

  7. Please I ‘d like an orientation about the proof of the estimate of sample mean variance for cluster design with subsampling (two stages) with probability proportional to the size in the first step and without replacement, and simple random sample in the second step also without replacement. .
    Thanks a lot for your help.

    1. Thank you for you comment. The proof I provided in this post is very general. However, your question refers to a very specific case to which I do not know the answer. Nevertheless, I saw that Peter Egger and Filip Tarlea recently published an article in Economic Letters called “Multi-way clustering estimation of standard errors in gravity models”, this might be a good place to start.

  8. Thanks a lot for this proof. All the other ones I found skipped a bunch of steps and I had no idea what was going on. Econometrics is very difficult for me–more so when teachers skip a bunch of steps. This post saved me some serious frustration. Thanks!

    1. What do exactly do you mean by prove the biased estimator of the sample variance? Do you mean the bias that occurs in case you divide by n instead of n-1?

  9. it would be better if you break it into several Lemmas

    for example, first proving the identities for Linear Combinations of Expected Value, and Variance, and then using the result of the Lemma, in the main proof

    you made it more cumbersome that it needed to be

    1. Hi, thanks again for your comments. I really appreciate your in-depth remarks. While it is certainly true that one can re-write the proof differently and less cumbersome, I wonder if the benefit of brining in lemmas outweighs its costs. In my eyes, lemmas would probably hamper the quick comprehension of the proof. This way the proof seems simple. I like things simple. Cheers, ad.

    1. Hey Abbas, welcome back! What do you mean by solving real statistics? About excel, I think Excel has a data analysis extension. If I were to use Excel that is probably the place I would start looking. However, use R! It free and a very good statistical software. I could write a tutorial, if you tell me what exactly it is that you need.

  10. I think it should be clarified that over which population is E(S^2) being calculated. Is x_i (for each i=0,…,n) being regarded as a separate random variable? If so, the population would be all permutations of size n from the population on which X is defined. I am confused here. Are N and n separate values?

    1. Hey! Thank you for your comment! Indeed, it was not very clean the way I specified X, n and N. I revised the post and tried to improve the notation. Now, X is a random variables, x_i is one observation of variable X. Overall, we have 1 to n observations. I hope this makes is clearer.

      Best, ad

  11. I have a problem understanding what is meant by 1/i=1 in equation (22) and how it disappears when plugging (34) into (23) [equation 35]. I feel like that’s an essential part of the proof that I just can’t get my head around. I’ve never seen that notation used in fractions.

    1. Hi Rui, thanks for your comment. Clearly, this i a typo. It should be 1/n-1 rather than 1/i=1. I corrected post. Thanks for pointing it out, I hope that the proof is much clearer now. Best, ad

Leave a reply to abbas Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.