# CLRM – Assumption 4: Independent and Identically Distributed Error Terms

Assumption 4 of the four assumption required by the Gauss-Markov theorem states that the error terms of the population $\epsilon_{i}$ are independent and identically distributed (iid) with an expected value of zero and a constant variance $\sigma^{2}$. Formally,

$\epsilon_{i} \sim iid(0,\sigma^{2})$

Note, the error terms $\epsilon_{i}$ refers to the error term of a single observation. The subscript $i$ attributes the error term $i$ to observation $i$. Behind this assumption stands the concept of repeated sampling.

Overall, assumption 4, i.e. independent and identically distributed error terms $\epsilon_{i} \sim iid(0,\sigma^{2})$ imply three characteristics. One could say, assumption 4 itself consists of three assumptions:

1. The expected value of the error term in the population is zero

Assumption 4 requires the expected value of the error term in the population to be zero. Formally,

$E(\epsilon_i) = 0$

Note, the assumption of $E(\epsilon_i) = 0$ is less strong than $E(\epsilon_i|X) = 0$. One can show that if $E(\epsilon_i|X) = 0$ is fulfilled it implies that also $E(\epsilon_i) = 0$ fulfilled. However, this is not true for the opposite, i.e. if $E(\epsilon_i) = 0$ is does not imply that $E(\epsilon_i|X) = 0$ is fulfilled.

Violating the assumption of $E(\epsilon_i|X) = 0$ affects our estimates in different ways. The validity of our estimates depends on the way we violate $E(\epsilon_i|X) = 0$. You can find an extensive discussion on the implication of the violation of OLS assumption 4.1 here.

1. Homoscedasticity

The assumption of independent and identically distributed error terms with an expected value of zero and a constant variance $\sigma^{2}$ also implies a constant variance in the cross-section. That means, each error term $\epsilon_i$ has the same finite variance $\sigma^{2}$. Formally,

$Var(\epsilon_i) = \sigma^{2} \text{ for all i}$

Violating this assumption, i.e. $\sigma_{i}^{2} \neq \sigma_{j}^{2} \text{ for } i \neq j$ leads to heteroscedasticity.

You can find possible consequences of heteroscedasticity and various techniques to solve these complications here.

1. No autocorrelation

The assumption of independent and identically distributed error terms with an expected value of zero and a constant variance $\sigma^{2}$ not only implies a constant variance in the cross-section, but requires the error terms to be stochastically independent throughout time. In order words, assumption four requires no autocorrelation. Formally,

$E(\epsilon_i \epsilon_j) = 0 \text{ for } i \neq j$

In case we have stochastic $x$ the conditional covariance must be zero, i.e. $Cov(\epsilon_i \epsilon_j |X) = 0$

We have autocorrelation in case one violates this assumption.

It is possible to summarize the last two assumptions (Homoscedasticity and No Autocorrelation) and rewrite them in a more compact mathematical form. In case of deterministic $X$ we can summarize the two assumptions as

$Var(\epsilon) = \sigma^{2}I$

and in case of stochastic $X$ as

$Var(\epsilon|X) = \sigma^{2}I$

Finally, one can remember assumption four as the assumption that requires error terms with an expected value of zero and a constant variance in the cross section and throughout time.