Learning a new programming language is costly. Usually it takes a considerable amount of time to get acquainted with a new language. Especially the first phase can be painful and frustrating. The good thing is that with enough time and effort most of us will learn how to master a programming language eventually. However, note that, once we are comfortable with one language, we hardly want to change again. It turns out that the cost of abandoning on programming language and switch to another are even higher than at the beginning. Knowing this, we really want to make sure not to invest in the wrong language. There might be nothing worse than after finally mastering a programming language, recognizing that there is no use for this language anymore. While in a former post I highlighted reason why to use R, I concentrate on the Pros and Cons of R in this post.
Why R?
Why should you use R?
There exists several reasons why one should start using R. During the last decade R has become the leading tool for statistics, data analysis, and machine learning. By now, R represents a viable alternative to traditional statistical programs such as Stata, SPSS, SAS, and Matlab. The reasons for R’s success are manifold. Continue reading Why R?
How to Enable Gui Root Login in Debian 9
In this post I am going to explain how to enable GUI root access on Debian 9. Instructions for Debian 10 are similar and can be found here. At this point I should warn you that using the root account is dangerous as you can ruin your whole system. Try to follow this guide exactly.
Seasonal adjustment
What is seasonal adjustment?
Seasonal adjustment refers to a statistical technique that tries to quantify and remove the influences of predictable seasonal patterns to reveal nonseasonal changes in data that would otherwise be overshadowed by the seasonal differences. Seasonal adjustments provide a Continue reading Seasonal adjustment
Clustered Standard Errors in R
The easiest way to compute clustered standard errors in R is the modified summary(). I added an additional parameter, called cluster, to the conventional summary() function. This parameter allows to specify a variable that defines the group / cluster in your data. The summary output will return clustered standard errors. Here is the syntax:
summary(lm.object, cluster=c("variable")) Continue reading Clustered Standard Errors in R
Example data – Clustered Standard Errors
The following R script creates an example dataset to illustrate the application of clustered standard errors. You can download the dataset here.
The script creates a dataset with a specific number of student test results. Individual students are identified via the variable student_id . The variable id_score comprises a student’s test score. In the test, students can score from 1 to 10 with 10 being the highest score possible. Continue reading Example data – Clustered Standard Errors
Clustered Standard Errors
Clustered standard errors are a way to obtain unbiased standard errors of OLS coefficients under a specific kind of heteroscedasticity. Recall that the presence of heteroscedasticity violates the Gauss Markov assumptions that are necessary to render OLS the best linear unbiased estimator (BLUE).
The estimation of clustered standard errors is justified if there are several different covariance structures within your data sample that vary by a certain characteristic – a “cluster”. Furthermore, the covariance structures must be homoskedastic within each cluster. In this case clustered standard errors provide unbiased standard errors estimates. Continue reading Clustered Standard Errors
The Derivative of the Natural Logarithm
The derivative of the natural logarithm is defined the following way:
The formal proof of the derivative is provided at the bottom of this post.
The following example further explains the derivative of the natural logarithm. Remember that Continue reading The Derivative of the Natural Logarithm
Robust Standard Errors in STATA
”Robust” standard errors is a technique to obtain unbiased standard errors of OLS coefficients under heteroscedasticity. In contrary to other statistical software, such as R for instance, it is rather simple to calculate robust standard errors in STATA. All you need to is add the option robust to you regression command. That is:
Robust Standard Errors in R
One can calculate robust standard errors in R in various ways. However, one can easily reach its limit when calculating robust standard errors in R, especially when you are new in R. It always bordered me that you can calculate robust standard errors so easily in STATA, but you needed ten lines of code to compute robust standard errors in R. I decided to solve the problem myself and modified the summary() function in R so that it replicates the simple way of STATA. I added the parameter robust to the summary() function that calculates robust standard errors if one sets the parameter to true. With the new summary() function you can get robust standard errors in your usual summary() output. All you need to do is to set the robust parameter to true:
summary(lm.object, robust=T) Continue reading Robust Standard Errors in R