Wald test for logistic regression

  • As far as I understand the Wald test in the context of logistic regression is used to determine whether a certain predictor variable $X$ is significant or not. It rejects the null hypothesis of the corresponding coefficient being zero.

    The test consists of dividing the value of the coefficient by standard error $\sigma$.

    What I am confused about is that $X/\sigma$ is also known as Z-score and indicates how likely it is that a given observation comes form the normal distribution (with mean zero).

    Perhaps it could be the other way around though, as the answer in this one is more developed.

  • COOLSerdash

    COOLSerdash Correct answer

    7 years ago

    The estimates of the coefficients and the intercepts in logistic regression (and any GLM) are found via maximum-likelihood estimation (MLE). These estimates are denoted with a hat over the parameters, something like $\hat{\theta}$. Our parameter of interest is denoted $\theta_{0}$ and this is usually 0 as we want to test whether the coefficient differs from 0 or not. From asymptotic theory of MLE, we know that the difference between $\hat{\theta}$ and $\theta_{0}$ will be approximately normally distributed with mean 0 (details can be found in any mathematical statistics book such as Larry Wasserman's All of statistics). Recall that standard errors are nothing else than standard deviations of statistics (Sokal and Rohlf write in their book Biometry: "a statistic is any one of many computed or estimated statistical quantities", e.g. the mean, median, standard deviation, correlation coefficient, regression coefficient, ...). Dividing a normal distribution with mean 0 and standard deviation $\sigma$ by its standard deviation will yield the standard normal distribution with mean 0 and standard deviation 1. The Wald statistic is defined as (e.g. Wasserman (2006): All of Statistics, pages 153, 214-215): $$ W=\frac{(\hat{\beta}-\beta_{0})}{\widehat{\operatorname{se}}(\hat{\beta})}\sim \mathcal{N}(0,1) $$ or $$ W^{2}=\frac{(\hat{\beta}-\beta_{0})^2}{\widehat{\operatorname{Var}}(\hat{\beta})}\sim \chi^{2}_{1} $$ The second form arises from the fact that the square of a standard normal distribution is the $\chi^{2}_{1}$-distribution with 1 degree of freedom (the sum of two squared standard normal distributions would be a $\chi^{2}_{2}$-distribution with 2 degrees of freedom and so on).

    Because the parameter of interest is usually 0 (i.e. $\beta_{0}=0$), the Wald statistic simplifies to $$ W=\frac{\hat{\beta}}{\widehat{\operatorname{se}}(\hat{\beta})}\sim \mathcal{N}(0,1) $$ Which is what you described: The estimate of the coefficient divided by its standard error.


    When is a $z$ and when a $t$ value used?

    The choice between a $z$-value or a $t$-value depends on how the standard error of the coefficients has been calculated. Because the Wald statistic is asymptotically distributed as a standard normal distribution, we can use the $z$-score to calculate the $p$-value. When we, in addition to the coefficients, also have to estimate the residual variance, a $t$-value is used instead of the $z$-value. In ordinary least squares (OLS, normal linear regression), the variance-covariance matrix of the coefficients is $\operatorname{Var}[\hat{\beta}|X]=\sigma^2(X'X)^{-1}$ where $\sigma^2$ is the variance of the residuals (which is unknown and has to be estimated from the data) and $X$ is the design matrix. In OLS, the standard errors of the coefficients are the square roots of the diagonal elements of the variance-covariance matrix. Because we don't know $\sigma^2$, we have to replace it by its estimate $\hat{\sigma}^{2}=s^2$, so: $\widehat{\operatorname{se}}(\hat{\beta_{j}})=\sqrt{s^2(X'X)_{jj}^{-1}}$. Now that's the point: Because we have to estimate the variance of the residuals to calculate the standard error of the coefficients, we need to use a $t$-value and the $t$-distribution.

    In logistic (and poisson) regression, the variance of the residuals is related to the mean. If $Y\sim Bin(n, p)$, the mean is $E(Y)=np$ and the variance is $\operatorname{Var}(Y)=np(1-p)$ so the variance and the mean are related. In logistic and poisson regression but not in regression with gaussian errors, we know the expected variance and don't have to estimate it separately. The dispersion parameter $\phi$ indicates if we have more or less than the expected variance. If $\phi=1$ this means we observe the expected amount of variance, whereas $\phi<1$ means that we have less than the expected variance (called underdispersion) and $\phi>1$ means that we have extra variance beyond the expected (called overdispersion). The dispersion parameter in logistic and poisson regression is fixed at 1 which means that we can use the $z$-score. The dispersion parameter . In other regression types such as normal linear regression, we have to estimate the residual variance and thus, a $t$-value is used for calculating the $p$-values. In R, look at these two examples:

    Logistic regression

    mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
    
    mydata$rank <- factor(mydata$rank)
    
    my.mod <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
    
    summary(my.mod)
    
    Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept) -3.989979   1.139951  -3.500 0.000465 ***
    gre          0.002264   0.001094   2.070 0.038465 *  
    gpa          0.804038   0.331819   2.423 0.015388 *  
    rank2       -0.675443   0.316490  -2.134 0.032829 *  
    rank3       -1.340204   0.345306  -3.881 0.000104 ***
    rank4       -1.551464   0.417832  -3.713 0.000205 ***
       ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    
    (Dispersion parameter for binomial family taken to be 1)
    

    Note that the dispersion parameter is fixed at 1 and thus, we get $z$-values.


    Normal linear regression (OLS)

    summary(lm(Fertility~., data=swiss))
    
    Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
    (Intercept)      66.91518   10.70604   6.250 1.91e-07 ***
    Agriculture      -0.17211    0.07030  -2.448  0.01873 *  
    Examination      -0.25801    0.25388  -1.016  0.31546    
    Education        -0.87094    0.18303  -4.758 2.43e-05 ***
    Catholic          0.10412    0.03526   2.953  0.00519 ** 
    Infant.Mortality  1.07705    0.38172   2.822  0.00734 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 7.165 on 41 degrees of freedom
    

    Here, we have to estimate the residual variance (denoted as "Residual standard error") and hence, we use $t$-values instead of $z$-values. Of course, in large samples, the $t$-distribution approximates the normal distribution and the difference doesn't matter.

    Another related post can be found here.

    Thank you very much for this nice post which answers all my questions.

    So, practically, regarding the first part of your excellent answer: If for some reason I'd have as an output the odds ratio and the Wald statistic, I could than calculate the standard error from these as: SE = (1/Wald-statistic)*ln(OR) Is this correct? Thanks!

    @SanderW.vanderLaan Thanks for your comment. Yes, I believe that's correct. If you perform a logistic regression, the Wald statistics will be the z-value.

    Such a great answer!!. I do have some revision suggestions: I personally feel this answer is mixing up details with the punch lists. I would put the details of how linear regression is using variance of residuals in a separate graph.

    Also for dispersion parameter and the connection to the R code, may be we can open another section or a separation line to talk about.

    Just a side note about this answer: the specific formula given for the variance-covariance matrix is from ordinary least squares regression, *not* from logistic regression, which does not use the residual standard error but instead involves a diagonal matrix with the individual Bernoulli variances from the predicted probability for each observation along the diagonal.

    @ely Thanks for the heads-up. Actually, I mentioned that the presented result is for OLS in the paragraph although it could be made more prominent, I admit. I added a short note to emphasize the fact.

    @COOLSerdash " In logistic and poisson regression but not in regression with gaussian errors, we know the expected variance and don't have to estimate it separately." **How do we calculate the expected variance (and how come we know this exactly?) I see that the variance of $Y$ is related to the mean, but 1) don't we need the variance of the coefficient we are testing? and 2) If the variance depends on $p$ -- as does $VAR(Y\vert X)$, we do not know the true value of $p$?

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM