This post shows how to manually construct the OLS estimator in R (see this post for the exact mathematical derivation of the OLS estimator). In contrary to a previous post, this post focuses on setting up the OLS estimator as a R function. While the aim of the former post was much more on the construction of the OLS estimator in general, is this post all about constructing a functional form around the estimator.
Overall, we will build an R function and test whether this function returns the same results as , the R build-in OLS estimator. However, before we come to the OLS function we will simulate a fake data set. We will use this data set later on to test the equivalence of our OLS function and
. The following code constructs our fake data set with simulated height and weight data.
## start with clean work-space rm(list=ls()) ## create artificial data set # set seed for reproducibility set.seed(501) # height in cm height <- rnorm(200,175,sd = 15) # weight in kilogram (the relationship between height # and weight is completely random) weight <- height-80+1.02*(height)^0.01* rnorm(200,0,sd = 15)* rnorm(200,1.1,sd = 0.2) # join height and weight in data frame df <- data.frame(height,weight)
We will define the function which incorporates the OLS estimator and return the estimated coefficients, the standard errors around the coefficients and the p-values indicating the probability that the estimated coefficients are different from zero. The following code defines the function and saves it under the object OLS.
## build OLS estimator as a function # create function names OLS OLS <- function(X,y){ y <- as.matrix(y) X <- as.matrix(cbind(1,X)) beta <- solve(t(X)%*%X)%*%t(X)%*%y res <- as.matrix(y-beta[1]-beta[2]*X[,2]) n <- nrow(df) k <- ncol(X) VCV <- 1/(n-k)*as.numeric(t(res)%*%res)*solve(t(X)%*%X) se <- sqrt(diag(VCV)) p_value <- rbind(2*pt(abs(beta[1]/se[1]),df=n-k, lower.tail= FALSE), 2*pt(abs(beta[2]/se[2]),df=n-k, lower.tail= FALSE)) output <- as.data.frame(cbind(c("(Intercept)","height"), beta,se,p_value)) names(output) <- c("Coefficients:","Estimate", "Std. Error","Pr(>|t|)") return(output) }
This OLS function requires two input variables. First, we have to specify our dependent variable y, which in our case will be weight. Second, we have to hand the function the independent variable, which in our case will be height.
Finally, we will compare the output of and our manual constructed estimator and show that they are equivalent. The following code regresses height on weight and presents the regression output of our own OLS estimator and the output of the R build-in OLS function
. Note, the results are equivalent.
## use R build-in OLS estimaor (lm()) reg <- lm(weight ~ height, data=df) summary(reg) ## use our own function OLS(y=df$weight,X=df$height)
2 thoughts on “Construct the OLS estimator as a function in R”