### Interpreting Residual and Null Deviance in GLM R

How to interpret the Null and Residual Deviance in GLM in R? Like, we say that smaller AIC is better. Is there any similar and quick interpretation for the deviances also?

Null deviance: 1146.1 on 1077 degrees of freedom Residual deviance: 4589.4 on 1099 degrees of freedom AIC: 11089

Let LL = loglikelihood

Here is a quick summary of what you see from the summary(glm.fit) output,

**Null Deviance = 2(LL(Saturated Model) - LL(Null Model)) on df = df_Sat - df_Null****Residual Deviance = 2(LL(Saturated Model) - LL(Proposed Model)) df = df_Sat - df_Proposed**The

**Saturated Model**is a model that assumes each data point has its own parameters (which means you have n parameters to estimate.)The

**Null Model**assumes the exact "opposite", in that is assumes one parameter for all of the data points, which means you only estimate 1 parameter.The

**Proposed Model**assumes you can explain your data points with p parameters + an intercept term, so you have p+1 parameters.If your

**Null Deviance**is really small, it means that the Null Model explains the data pretty well. Likewise with your**Residual Deviance**.What does really small mean? If your model is "good" then your

**Deviance**is approx Chi^2 with (df_sat - df_model) degrees of freedom.If you want to compare you Null model with your Proposed model, then you can look at

**(Null Deviance - Residual Deviance)**approx Chi^2 with**df Proposed - df Null**= (n-(p+1))-(n-1)=pAre the results you gave directly from R? They seem a little bit odd, because generally you should see that the degrees of freedom reported on the Null are always higher than the degrees of freedom reported on the Residual. That is because again, Null Deviance df = Saturated df - Null df = n-1 Residual Deviance df = Saturated df - Proposed df = n-(p+1)

@Teresa: Yes, these results are from R. Why would this happen? Any problem with the model here?

@Hack-R: sorry for such a late response, I'm new to Stackexchange. For multinomial models you don't use the glm function in R and the output is different. You will need to look at either a proportional odds model or ordinal regression, the mlogit function. It is worth it to do a bit of reading on multinomial glms, they have slighty different assumptions. If I can get to it during the break, I'll update this with some more information.

@Anjali, I'm not quite sure why you would get results like that in R. Its hard to know without seeing your data/results. In general, I don't see why the residual degrees of freedom would be higher than the null df. How many parameters were you estimating?

@TereasaStat Thanks much. I was trying glmnet and mlogit for multinomial. Glmnet is very different from glm it turns out, so my question was naive.

@TeresaStat could you please clarify this sentence "What does really small mean? If your model is "good" then your Deviance is approx Chi^2 with (df_sat - df_model) degrees of freedom. ". I am not sure what chisq with () e means. Chisq is calculated using two numbers, no?

@user4050 The goal of modeling in general can be seen as using the smallest number of parameters to explain the most about your response. To figure out how many parameters to use you need to look at the benefit of adding one more parameter. If an extra parameter explains a lot (produces high deviance) from your smaller model, then you need the extra parameter. In order to quantify what a lot is you need statistical theory. The theory tell us that the deviance is chi squared with degrees of freedom equal to the difference of parameters between your two models. Is it any clearer?

The null deviance shows how well the response is predicted by the model with nothing but an intercept.

The residual deviance shows how well the response is predicted by the model when the predictors are included. From your example, it can be seen that the deviance goes up by 3443.3 when 22 predictor variables are added (note: degrees of freedom = no. of observations – no. of predictors) . This increase in deviance is evidence of a significant lack of fit.

We can also use the residual deviance to test whether the null hypothesis is true (i.e. Logistic regression model provides an adequate fit for the data). This is possible because the deviance is given by the chi-squared value at a certain degrees of freedom. In order to test for significance, we can find out associated p-values using the below formula in R:

`p-value = 1 - pchisq(deviance, degrees of freedom)`

Using the above values of residual deviance and DF, you get a p-value of approximately zero showing that there is a significant lack of evidence to support the null hypothesis.

`> 1 - pchisq(4589.4, 1099) [1] 0`

How do you know what the cut off is for good/bad fit based on the deviance and number of predictor variables (without the pchisq)? Is it just if Residual Deviance > NULL Deviance or is there some range/ratio?

Your answer isn't wrong, but is subject to misunderstanding. In fact, it has been misunderstood (cf here). In light of that, can you clarify the differences that are implicit in your code?

While both answers given here are correct (and really useful resources), from page 432 of Introduction to Linear Regression Analysis (Montgomery, Peck, Vining, 5E), a general rule of thumb is given as if $$ \frac{D}{n-p} >> 1, $$ where $p$ is the number of regressors, $n$ is the number of observations and $D$ is the residual deviance, then the fit can be considered inadequate.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM

Hack-R 6 years ago

Yes, that's a very useful write-up @TeresaStat, thanks. How robust is this? Do the definitions change if you're talking about a multinomial model instead of a `GLM`?