How are the standard errors of coefficients calculated in a regression?
For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the
lm()
function inR
, but haven't been able to pin it down. What is the formula / implementation used?The linear model is written as $$ \left| \begin{array}{l} \mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon} \\ \mathbf{\epsilon} \sim N(0, \sigma^2 \mathbf{I}), \end{array} \right.$$ where $\mathbf{y}$ denotes the vector of responses, $\mathbf{\beta}$ is the vector of fixed effects parameters, $\mathbf{X}$ is the corresponding design matrix whose columns are the values of the explanatory variables, and $\mathbf{\epsilon}$ is the vector of random errors.
It is well known that an estimate of $\mathbf{\beta}$ is given by (refer, e.g., to the wikipedia article) $$\hat{\mathbf{\beta}} = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \mathbf{y}.$$ Hence $$ \textrm{Var}(\hat{\mathbf{\beta}}) = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \;\sigma^2 \mathbf{I} \; \mathbf{X} (\mathbf{X}^{\prime} \mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1} (\mathbf{X}^{\prime} \mathbf{X}) (\mathbf{X}^{\prime} \mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}, $$ [reminder: $\textrm{Var}(AX)=A\times \textrm{Var}(X) \times A′$, for some random vector $X$ and some non-random matrix $A$]
so that $$ \widehat{\textrm{Var}}(\hat{\mathbf{\beta}}) = \hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}, $$ where $\hat{\sigma}^2$ can be obtained by the Mean Square Error (MSE) in the ANOVA table.
Example with a simple linear regression in R
#------generate one data set with epsilon ~ N(0, 0.25)------ seed <- 1152 #seed n <- 100 #nb of observations a <- 5 #intercept b <- 2.7 #slope set.seed(seed) epsilon <- rnorm(n, mean=0, sd=sqrt(0.25)) x <- sample(x=c(0, 1), size=n, replace=TRUE) y <- a + b * x + epsilon #----------------------------------------------------------- #------using lm------ mod <- lm(y ~ x) #-------------------- #------using the explicit formulas------ X <- cbind(1, x) betaHat <- solve(t(X) %*% X) %*% t(X) %*% y var_betaHat <- anova(mod)[[3]][2] * solve(t(X) %*% X) #--------------------------------------- #------comparison------ #estimate > mod$coef (Intercept) x 5.020261 2.755577 > c(betaHat[1], betaHat[2]) [1] 5.020261 2.755577 #standard error > summary(mod)$coefficients[, 2] (Intercept) x 0.06596021 0.09725302 > sqrt(diag(var_betaHat)) x 0.06596021 0.09725302 #----------------------
When there is a single explanatory variable, the model reduces to $$y_i = a + bx_i + \epsilon_i, \qquad i = 1, \dotsc, n$$ and $$\mathbf{X} = \left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right), \qquad \mathbf{\beta} = \left( \begin{array}{c} a\\b \end{array} \right)$$ so that $$(\mathbf{X}^{\prime} \mathbf{X})^{-1} = \frac{1}{n\sum x_i^2 - (\sum x_i)^2} \left( \begin{array}{cc} \sum x_i^2 & -\sum x_i \\ -\sum x_i & n \end{array} \right)$$ and formulas become more transparant. For example, the standard error of the estimated slope is $$\sqrt{\widehat{\textrm{Var}}(\hat{b})} = \sqrt{[\hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}]_{22}} = \sqrt{\frac{n \hat{\sigma}^2}{n\sum x_i^2 - (\sum x_i)^2}}.$$
> num <- n * anova(mod)[[3]][2] > denom <- n * sum(x^2) - sum(x)^2 > sqrt(num / denom) [1] 0.09725302
Thanks for the thorough answer. So, I take it the last formula doesn't hold in the multivariate case?
No, the very last formula only works for the specific X matrix of the simple linear model. In the multivariate case, you have to use the general formula given above.
+1, a quick question, how does $Var(\hat\beta)$ come?
@loganecolss: It comes from the fact that $\text{Var}(AX)=A\text{Var(X)}A'$, for some random vector $X$ and some non-random matrix $A$.
note that these are the right answers for hand calculation, but the actual implementation used within `lm.fit`/`summary.lm` is a bit different, for stability and efficiency ...
This video derives the mean and variance too: https://www.youtube.com/watch?v=jyBtfhQsf44
To be clear, `anova(mod)[[3]][2]` only works for a model with one factor, right? The "3" picks out the "Mean Sq", and the "2" picks out the "Residuals" line? Also, is that the variance of the residuals? It looks a bit different. What exactly is it?
Is another way to think of that sigma^2 as "sum-squared error of the model divided by degrees of freedom"?
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
Haitao Du 4 years ago
good question, many people know the regression from linear algebra point of view, where you solve the linear equation $X'X\beta=X'y$ and get the answer for beta. Not clear why we have standard error and assumption behind it.