### How are the standard errors of coefficients calculated in a regression?

• For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the lm() function in R, but haven't been able to pin it down. What is the formula / implementation used?

good question, many people know the regression from linear algebra point of view, where you solve the linear equation $X'X\beta=X'y$ and get the answer for beta. Not clear why we have standard error and assumption behind it.

8 years ago

The linear model is written as $$\left| \begin{array}{l} \mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon} \\ \mathbf{\epsilon} \sim N(0, \sigma^2 \mathbf{I}), \end{array} \right.$$ where $$\mathbf{y}$$ denotes the vector of responses, $$\mathbf{\beta}$$ is the vector of fixed effects parameters, $$\mathbf{X}$$ is the corresponding design matrix whose columns are the values of the explanatory variables, and $$\mathbf{\epsilon}$$ is the vector of random errors.

It is well known that an estimate of $$\mathbf{\beta}$$ is given by (refer, e.g., to the wikipedia article) $$\hat{\mathbf{\beta}} = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \mathbf{y}.$$ Hence $$\textrm{Var}(\hat{\mathbf{\beta}}) = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \;\sigma^2 \mathbf{I} \; \mathbf{X} (\mathbf{X}^{\prime} \mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1} (\mathbf{X}^{\prime} \mathbf{X}) (\mathbf{X}^{\prime} \mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1},$$ [reminder: $$\textrm{Var}(AX)=A\times \textrm{Var}(X) \times A′$$, for some random vector $$X$$ and some non-random matrix $$A$$]

so that $$\widehat{\textrm{Var}}(\hat{\mathbf{\beta}}) = \hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1},$$ where $$\hat{\sigma}^2$$ can be obtained by the Mean Square Error (MSE) in the ANOVA table.

Example with a simple linear regression in R

#------generate one data set with epsilon ~ N(0, 0.25)------
seed <- 1152 #seed
n <- 100     #nb of observations
a <- 5       #intercept
b <- 2.7     #slope

set.seed(seed)
epsilon <- rnorm(n, mean=0, sd=sqrt(0.25))
x <- sample(x=c(0, 1), size=n, replace=TRUE)
y <- a + b * x + epsilon
#-----------------------------------------------------------

#------using lm------
mod <- lm(y ~ x)
#--------------------

#------using the explicit formulas------
X <- cbind(1, x)
betaHat <- solve(t(X) %*% X) %*% t(X) %*% y
var_betaHat <- anova(mod)[[3]][2] * solve(t(X) %*% X)
#---------------------------------------

#------comparison------
#estimate
> mod$coef (Intercept) x 5.020261 2.755577 > c(betaHat[1], betaHat[2]) [1] 5.020261 2.755577 #standard error > summary(mod)$coefficients[, 2]
(Intercept)           x
0.06596021  0.09725302

> sqrt(diag(var_betaHat))
x
0.06596021 0.09725302
#----------------------


When there is a single explanatory variable, the model reduces to $$y_i = a + bx_i + \epsilon_i, \qquad i = 1, \dotsc, n$$ and $$\mathbf{X} = \left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right), \qquad \mathbf{\beta} = \left( \begin{array}{c} a\\b \end{array} \right)$$ so that $$(\mathbf{X}^{\prime} \mathbf{X})^{-1} = \frac{1}{n\sum x_i^2 - (\sum x_i)^2} \left( \begin{array}{cc} \sum x_i^2 & -\sum x_i \\ -\sum x_i & n \end{array} \right)$$ and formulas become more transparant. For example, the standard error of the estimated slope is $$\sqrt{\widehat{\textrm{Var}}(\hat{b})} = \sqrt{[\hat{\sigma}^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}]_{22}} = \sqrt{\frac{n \hat{\sigma}^2}{n\sum x_i^2 - (\sum x_i)^2}}.$$

> num <- n * anova(mod)[[3]][2]
> denom <- n * sum(x^2) - sum(x)^2
> sqrt(num / denom)
[1] 0.09725302


Thanks for the thorough answer. So, I take it the last formula doesn't hold in the multivariate case?

No, the very last formula only works for the specific X matrix of the simple linear model. In the multivariate case, you have to use the general formula given above.

+1, a quick question, how does $Var(\hat\beta)$ come?

@loganecolss: It comes from the fact that $\text{Var}(AX)=A\text{Var(X)}A'$, for some random vector $X$ and some non-random matrix $A$.

note that these are the right answers for hand calculation, but the actual implementation used within lm.fit/summary.lm is a bit different, for stability and efficiency ...

This video derives the mean and variance too: https://www.youtube.com/watch?v=jyBtfhQsf44

To be clear, anova(mod)[[3]][2] only works for a model with one factor, right? The "3" picks out the "Mean Sq", and the "2" picks out the "Residuals" line? Also, is that the variance of the residuals? It looks a bit different. What exactly is it?

Is another way to think of that sigma^2 as "sum-squared error of the model divided by degrees of freedom"?