# CLRM – Assumption 3: Explanatory Variables must be exogenous

Assumption 3, exogeneity of explanatory variables requires that the explanatory variables $X$ in the model do not explain variation in the error terms, formally we express assumption 3 as $E(\epsilon_i|X)=0$

This means that all data in the $X$ matrix are stochastically independent of $e_i$ for all $i$. Note, the law of iterated expectations ensure that $E(\epsilon)=0$ as $E(\epsilon) = E[E(\epsilon|X)]=E(0) = 0$.

Although having stochastic independence is necessary to fulfill assumption 3, textbooks sometimes mention deterministic $X$ as a sufficient requirement. I should mention that, we fulfill assumption 3 per definition when applying OLS on deterministic data. However, we do not require the strong demand of deterministic data to meet assumption 3. Ensuring stochastically independence of $X$ is sufficient to secure an unbiased and consistent estimator. You will recognize soon enough that even stochastically independent data are not easy to find.

Summarizing assumption 3 of the classical linear regression model (clrm) in mildly different word: In order to fulfill assumption 3 the data generating process of X has to be independent of the data generating process of the error terms.

Impact of assumption 3

Assumption 3 requires the error term to be stochastically independent of all $X_{j,k}$ additionally to the independence of observation specific characteristics $X_{i,k}$. For cross sectional data this means that the error term of observation $i$ has to stochastically independent of all explanatory variables of observation $i$. Furthermore, the error term of observation $i$ has to be independent of the explanatory variables of all other observations $j$. For time series data the assumption requires inter-temporal independence. This means that the error term of period $t$ must be stochastically independent of all explanatory variables $X$ of the past, the presence and the future.

Violating assumption 3

The OLS estimator is neither consistent nor unbiased in case assumption 3 is violated. Unfortunately, we violate assumption 3 very easily. Common case that violate assumption 3 include omitted variables, measurement error and simultaneity.