Construct the OLS estimator as a function in R

This post shows how to manually construct the OLS estimator in R (see this post for the exact mathematical derivation of the OLS estimator). In contrary to a previous post, this post focuses on setting up the OLS estimator as a R function. While the aim of the former post was much more on the construction of the OLS estimator in general, is this post all about constructing a functional form around the estimator.

Overall, we will build an R function and test whether this function returns the same results as lm(), the R build-in OLS estimator. However, before we come to the OLS function we will simulate a fake data set. We will use this data set later on to test the equivalence of our OLS function and lm(). The following code constructs our fake data set with simulated height and weight data.

## start with clean work-space
rm(list=ls())

## create artificial data set
#  set seed for reproducibility
set.seed(501)

#  height in cm
height <- rnorm(200,175,sd = 15)

#  weight in kilogram (the relationship between height
#  and weight is completely random)
weight <- height-80+1.02*(height)^0.01*
  rnorm(200,0,sd = 15)*
  rnorm(200,1.1,sd = 0.2)

#  join height and weight in data frame
df <- data.frame(height,weight)

We will define the function which incorporates the OLS estimator and return the estimated coefficients, the standard errors around the coefficients and the p-values indicating the probability that the estimated coefficients are different from zero. The following code defines the function and saves it under the object OLS.

## build OLS estimator as a function
# create function names OLS
OLS <- function(X,y){
  y <- as.matrix(y)
  X <- as.matrix(cbind(1,X))
  beta <- solve(t(X)%*%X)%*%t(X)%*%y
  res <- as.matrix(y-beta[1]-beta[2]*X[,2])
  n <- nrow(df)
  k <- ncol(X)
  VCV <- 1/(n-k)*as.numeric(t(res)%*%res)*solve(t(X)%*%X)
  se <- sqrt(diag(VCV))
  p_value <- rbind(2*pt(abs(beta[1]/se[1]),df=n-k,
                        lower.tail= FALSE),
                   2*pt(abs(beta[2]/se[2]),df=n-k,
                        lower.tail= FALSE))
  output <- as.data.frame(cbind(c("(Intercept)","height"),
                                beta,se,p_value))
names(output) <- c("Coefficients:","Estimate",
                   "Std. Error","Pr(>|t|)")
return(output)
}

This OLS function requires two input variables. First, we have to specify our dependent variable y, which in our case will be weight. Second, we have to hand the function the independent variable, which in our case will be height.

Finally, we will compare the output of lm() and our manual constructed estimator and show that they are equivalent. The following code regresses height on weight and presents the regression output of our own OLS estimator and the output of the R build-in OLS function lm(). Note, the results are equivalent.

## use R build-in OLS estimaor (lm())
reg <- lm(weight ~ height, data=df)
summary(reg)

## use our own function
OLS(y=df$weight,X=df$height)
Advertisement

2 thoughts on “Construct the OLS estimator as a function in R”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.