The easiest way to compute clustered standard errors in R is the modified ` summary()`

. I added an additional parameter, called `cluster,`

to the conventional ` summary() `

function. This parameter allows to specify a variable that defines the group / cluster in your data. The summary output will return clustered standard errors. Here is the syntax:

` summary(lm.object, cluster=c("variable")) `

Furthermore, I uploaded the function to a github.com repository. This makes it easy to load the function into your R session. The following lines of code import the function into your R session. You can also download the function directly from this post yourself.

# load necessary packages for importing the function
library(RCurl)
# import the function from repository
url_robust <- "https://raw.githubusercontent.com/IsidoreBeautrelet/economictheoryblog/master/robust_summary.R"
eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)),
envir=.GlobalEnv)

Below you will find a tutorial that demonstrates how to import the modified ` summary() `

function into you R session. It also explains the application of the function in greater detail. The tutorial is based on an fake data that I generate here and which you can download here.

# start with an empty workspace
rm(list=ls())
# load necessary packages for importing the function
library(RCurl)
# load necessary packages for the example
library(gdata)
library(zoo)
# import the function
url_robust <- "https://raw.githubusercontent.com/IsidoreBeautrelet/economictheoryblog/master/robust_summary.R"
eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)),
envir=.GlobalEnv)
# download data set for example
url_data <- "https://economictheoryblog.files.wordpress.com/2016/12/data.xls"
data <- read.xls(gsub("s:",":",url_data))
# estimate simple linear model
reg <- lm(id_score ~ class_size,
data=data)
# use new summary function
summary(reg)
summary(reg,cluster = c("class_id"))
# Note, the standard errors are much
# larger with clustered standard errors

### Like this:

Like Loading...

*Related*

Pingback: Clustered Standard Errors | Economic Theory Blog

Pingback: Example data – Clustered Standard Errors | Economic Theory Blog

Will this function work with two clustering variables? Something like: summary(lm.object, cluster=c(“variable1”, “variable2”))?

Yes. This function allows two clustering variables. Cheers.

Thank you. One more question: is the function specific to linear models? Or can it work for generalized linear model like logistic regression or other non-linear models?

Currently, the function only works with the lm class in R. I am working on generalizing the function. However, it will still take some time until a general version of the function is available, I suppose.

Thank you so much. I tried the function and it worked well with a single clustering variable. But it gives an error with two clustering variables. Any clues? Here is what I have done:

> SITE URLdata VarNames test fm url_robust eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)), envir=.GlobalEnv)

# one clustering variable “firmid”

> summary(fm, cluster=c(“firmid”))

Call:

lm(formula = y ~ x, data = test)

Residuals:

Min 1Q Median 3Q Max

-6.7611 -1.3680 -0.0166 1.3387 8.6779

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02968 0.06701 0.443 0.658

x 1.03483 0.05060 20.453 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.005 on 4998 degrees of freedom

Multiple R-squared: 0.2078, Adjusted R-squared: 0.2076

F-statistic: 418.3 on 1 and 499 DF, p-value: summary(fm, cluster=c(“year”))

Call:

lm(formula = y ~ x, data = test)

Residuals:

Min 1Q Median 3Q Max

-6.7611 -1.3680 -0.0166 1.3387 8.6779

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02968 0.02339 1.269 0.204

x 1.03483 0.03339 30.993 summary(fm, cluster=c(“firmid”, “year”))

Error in summary.lm(fm, cluster = c(“firmid”, “year”)) :

object ‘M’ not found

You are right. There was a bug in the code. I fixed it. I guess it should work now. However, you should be careful now with interpreting the F-Statistic. I am not sure if I took the right amount of degrees of freedom. The rest of the output should be fine.

Besides the coding, from you code I see that you are working with non-nested clusters. I cannot remember from the top of my head. But should you not be careful with such a structure?

Hi,

Thanks a lot first of all for putting in so much effort to write this function.

I am getting an error for twoway clustering. This is the error I get:

Error in if (nrow(dat)

Is this error specific to me?

Thanks,

Pankaj

Hey. Thank you for comment. There was a bug in the code. Basically, not all of your observations have a cluster, i.e. one or both of your cluster variables contain NA’s. The problem was that I did not set-up the warning properly. That is, the warning only worked for the single clustering case, but did not work for twoway clustering. I fixed it and now it should work. Let me know if you encounter any other problems. Cheers.

The error didn’t paste properly in the previous comment.

Error in if (nrow(dat)

I am sorry my comment above is a bit of a mess. It changed when I posted it. But basically when I use two clustering variables [e.g., summary(fm, cluster=c(“firmid”, “year”))], I get the error message: “Error in summary.lm(fm, cluster = c(“firmid”, “year”)) :

object ‘M’ not found”

No worries, in my browser it appears quite clear.

Thank you for your response and your great function. It really helps.

I’m glad it helps.

Hi! Is there any way to use this code when using weights in your lm model? I get an error telling me that my weights are not recognized : “Error in get(all.vars(object$call)[length(all.vars(object$call))]) : objet ‘yeardif’ introuvable”

Is there anything I can do?

Many thanks for your help!

Hi! Thank you so much for you comment. There was a problem when extracting the data object from the formula when weights were specified. I was able to fix the problem and now it should work fine. Thank you again for your help.

Hi again!

Thank you very much for your reply! I tried again, and now I only get NAs in the Standard error, t-value, and p value column, even though I have no missing values in my data… I don’t get it! Do you know what might be going on?

Thanks again!

Hi! I conducted some additional robustness tests and everything works fine for me. Can you provide a reproducible example?

Cheers!