# Linear Regression in R

R presents various ways to carry out linear regressions. The most natural way is to use the lm() function, the R build-in OLS estimator. In this post I will present you how to use lm() and run OLS on the following model $y = \alpha + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3}$

The lm() function requires you to specify the model and to indicate the object containing the data. You have to specify the model in lm() the following way $y \sim x_{1} + x_{2} + x_{3}$

where $y, x_{1}, x_{2}$ and $x_{3}$ are replaced with the variables names.

The model would look the following way when specified in R. I assume that the data is stored in a data frame named df.

## use R build-in OLS estimaor (lm())
reg <- lm(y ~ x1 + x2 + x3, data=df)
summary(reg)


Furthermore, R offers several additional function in order to evaluate the regression output. Some of these post-regression functions are listed below


# several other useful functions
coefficients(reg) # show coefficients
anova(reg) # show anova table
vcov(reg) # show covariance matrix for model parameters
confint(reg, level=0.95) # CIs for model parameters
regted(reg) # show fitted values
residuals(reg) # show residuals
influence(reg) # show diagnostics



Finally, the lm() function is a complete wrapper around the OLS estimator in R. It provides little inside of the calculations carried out in the background. In the following post I rebuild the OLS estimator from scratch using R.  I go through every single step of the calculations and provide estimates of the coefficients, standard errors and p-values. Finally, I incorporate the presented code into a function and show that the function returns the same results as lm(). The manually constructed function can be found here.