The easiest way to compute clustered standard errors in R is the modified ` summary()`

. I added an additional parameter, called `cluster,`

to the conventional ` summary() `

function. This parameter allows to specify a variable that defines the group / cluster in your data. The summary output will return clustered standard errors. Here is the syntax:

` summary(lm.object, cluster=c("variable")) `

Furthermore, I uploaded the function to a github.com repository. This makes it easy to load the function into your R session. The following lines of code import the function into your R session. You can also download the function directly from this post yourself.

# load necessary packages for importing the function
library(RCurl)
# import the function from repository
url_robust <- "https://raw.githubusercontent.com/IsidoreBeautrelet/economictheoryblog/master/robust_summary.R"
eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)),
envir=.GlobalEnv)

Below you will find a tutorial that demonstrates how to import the modified ` summary() `

function into you R session. It also explains the application of the function in greater detail. The tutorial carries out an OLS estimation in R that is based on an fake data that I generate here and which you can download here.

# start with an empty workspace
rm(list=ls())
# load necessary packages for importing the function
library(RCurl)
# load necessary packages for the example
library(gdata)
library(zoo)
# import the function
url_robust <- "https://raw.githubusercontent.com/IsidoreBeautrelet/economictheoryblog/master/robust_summary.R"
eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)),
envir=.GlobalEnv)
# download data set for example
url_data <- "https://economictheoryblog.files.wordpress.com/2016/12/data.xls"
data <- read.xls(gsub("s:",":",url_data))
# estimate simple linear model
reg <- lm(id_score ~ class_size,
data=data)
# use new summary function
summary(reg)
summary(reg,cluster = c("class_id"))
# Note, the standard errors are much
# larger with clustered standard errors

### Like this:

Like Loading...

*Related*

Pingback: Clustered Standard Errors | Economic Theory Blog

Pingback: Example data – Clustered Standard Errors | Economic Theory Blog

Will this function work with two clustering variables? Something like: summary(lm.object, cluster=c(“variable1”, “variable2”))?

Yes. This function allows two clustering variables. Cheers.

Thank you. One more question: is the function specific to linear models? Or can it work for generalized linear model like logistic regression or other non-linear models?

Currently, the function only works with the lm class in R. I am working on generalizing the function. However, it will still take some time until a general version of the function is available, I suppose.

Thank you so much. I tried the function and it worked well with a single clustering variable. But it gives an error with two clustering variables. Any clues? Here is what I have done:

> SITE URLdata VarNames test fm url_robust eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)), envir=.GlobalEnv)

# one clustering variable “firmid”

> summary(fm, cluster=c(“firmid”))

Call:

lm(formula = y ~ x, data = test)

Residuals:

Min 1Q Median 3Q Max

-6.7611 -1.3680 -0.0166 1.3387 8.6779

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02968 0.06701 0.443 0.658

x 1.03483 0.05060 20.453 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.005 on 4998 degrees of freedom

Multiple R-squared: 0.2078, Adjusted R-squared: 0.2076

F-statistic: 418.3 on 1 and 499 DF, p-value: summary(fm, cluster=c(“year”))

Call:

lm(formula = y ~ x, data = test)

Residuals:

Min 1Q Median 3Q Max

-6.7611 -1.3680 -0.0166 1.3387 8.6779

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02968 0.02339 1.269 0.204

x 1.03483 0.03339 30.993 summary(fm, cluster=c(“firmid”, “year”))

Error in summary.lm(fm, cluster = c(“firmid”, “year”)) :

object ‘M’ not found

You are right. There was a bug in the code. I fixed it. I guess it should work now. However, you should be careful now with interpreting the F-Statistic. I am not sure if I took the right amount of degrees of freedom. The rest of the output should be fine.

Besides the coding, from you code I see that you are working with non-nested clusters. I cannot remember from the top of my head. But should you not be careful with such a structure?

Hi,

Thanks a lot first of all for putting in so much effort to write this function.

I am getting an error for twoway clustering. This is the error I get:

Error in if (nrow(dat)

Is this error specific to me?

Thanks,

Pankaj

Hey. Thank you for comment. There was a bug in the code. Basically, not all of your observations have a cluster, i.e. one or both of your cluster variables contain NA’s. The problem was that I did not set-up the warning properly. That is, the warning only worked for the single clustering case, but did not work for twoway clustering. I fixed it and now it should work. Let me know if you encounter any other problems. Cheers.

The error didn’t paste properly in the previous comment.

Error in if (nrow(dat)

I am sorry my comment above is a bit of a mess. It changed when I posted it. But basically when I use two clustering variables [e.g., summary(fm, cluster=c(“firmid”, “year”))], I get the error message: “Error in summary.lm(fm, cluster = c(“firmid”, “year”)) :

object ‘M’ not found”

No worries, in my browser it appears quite clear.

Thank you for your response and your great function. It really helps.

I’m glad it helps.

Hi! Is there any way to use this code when using weights in your lm model? I get an error telling me that my weights are not recognized : “Error in get(all.vars(object$call)[length(all.vars(object$call))]) : objet ‘yeardif’ introuvable”

Is there anything I can do?

Many thanks for your help!

Hi! Thank you so much for you comment. There was a problem when extracting the data object from the formula when weights were specified. I was able to fix the problem and now it should work fine. Thank you again for your help.

Hi again!

Thank you very much for your reply! I tried again, and now I only get NAs in the Standard error, t-value, and p value column, even though I have no missing values in my data… I don’t get it! Do you know what might be going on?

Thanks again!

Hi! I conducted some additional robustness tests and everything works fine for me. Can you provide a reproducible example?

Cheers!

Hey! Sorry to come back to you after all this time. But I wonder, were you ever able to solve your problem with the function?

Best, ad.

Hi!

Thanks for the function.

I think I am getting the same problem as ct. Even reproducing the example you provide I get a bunch of NAs.

Is it only me?

Many thanks!

Hey!

Thank you for you remark. I tried the example and it works fine for me. Could you provide some more details? Do you have the package “sandwich” installed?

Cheers!

Below a printout of my console.

Maybe I am missing some packages.

Btw, sorry for taking up so much space.

And apologies for I am new to R and probably this is why I am not seeing the obvious.

Thank you for the printout. I now removed it from your comment. Unfortunately, I still cannot find the error. I tried the example with the newest R Version (3.4.3) and went to a completely different PC, in both cases the example worked fine. Can you check if you have the sandwich package installed? Furthermore, I noticed that you download the data differently – not that this should matter – but did the gdata package not work for you? Finally, you might have some packages loaded in your memory that mask other functions. Could you restart R and only run my example? Best, ad.

Hi! Thanks so much for making this available. I am a newbie to R, and I am having some trouble making the modified summary() function work. Although the example you provide in the short tutorial above worked smoothly, I tried to use it with a toy example of mine and I got the error message

“Error in summary.lm(mod, cluster = c(i)) :

The function only allows max. 2 clusters. You provided more.”

Here is a reproducible example (I realize that since each cluster is a singleton, clustering should be irrelevant for the calculation of standard errors; but I don’t see why that should make the function return an error message):

rm(list=ls())

library(RCurl)

url_robust <- "https://raw.githubusercontent.com/IsidoreBeautrelet/economictheoryblog/master/robust_summary.R"

eval(parse(text = getURL(url_robust, ssl.verifypeer = FALSE)), envir=.GlobalEnv)

i <- seq(1,100,1)

x <- rnorm(100)

y <- 1 + 2*x + rnorm(100)

simpledata <- as.data.frame(cbind(i,x,y))

mod <- lm(y~x, data = simpledata)

summary(mod, cluster = c(i))

Thanks a lot!

Hi! Thank you for you remark. I am glad to hear that you are using my function. The reason that your example does not work properly has actually nothing to do with the cluster function, but is caused by a small syntax error. Try to put the variable

iin last line of you code, i.e. summary(mod, cluster = c(i)), in parentheses such that it looks like this“i”. Your example should work fine then. Let me know if it works. Also, just get in touch in case you encounter any other problems. CheersThanks a lot for the quick reply! It worked perfectly.