Logistic regression model does not converge
I've got some data about airline flights (in a data frame called
flights
) and I would like to see if the flight time has any effect on the probability of a significantly delayed arrival (meaning 10 or more minutes). I figured I'd use logistic regression, with the flight time as the predictor and whether or not each flight was significantly delayed (a bunch of Bernoullis) as the response. I used the following code...flights$BigDelay <- flights$ArrDelay >= 10 delay.model <- glm(BigDelay ~ ArrDelay, data=flights, family=binomial(link="logit")) summary(delay.model)
...but got the following output.
> flights$BigDelay <- flights$ArrDelay >= 10 > delay.model <- glm(BigDelay ~ ArrDelay, data=flights, family=binomial(link="logit")) Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred > summary(delay.model) Call: glm(formula = BigDelay ~ ArrDelay, family = binomial(link = "logit"), data = flights) Deviance Residuals: Min 1Q Median 3Q Max -3.843e-04 -2.107e-08 -2.107e-08 2.107e-08 3.814e-04 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -312.14 170.26 -1.833 0.0668 . ArrDelay 32.86 17.92 1.833 0.0668 . --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2.8375e+06 on 2291292 degrees of freedom Residual deviance: 9.1675e-03 on 2291291 degrees of freedom AIC: 4.0092 Number of Fisher Scoring iterations: 25
What does it mean that the algorithm did not converge? I thought it be because the
BigDelay
values wereTRUE
andFALSE
instead of0
and1
, but I got the same error after I converted everything. Any ideas?not sure I deserve the "accept". @Conjugate Prior's answer explained what was wrong with your model. I thought it worth explaining the warning you mentioned in terms of the algorithm.
If you have the actual delay times, you are likely to get better information by modeling them, rather than reducing them to a binary variable.
related question
you can try glm1() function. It overcome the problem converge
glm()
uses an iterative re-weighted least squares algorithm. The algorithm hit the maximum number of allowed iterations before signalling convergence. The default, documented in?glm.control
is 25. You pass control parameters as a list in theglm
call:delay.model <- glm(BigDelay ~ ArrDelay, data=flights, family=binomial, control = list(maxit = 50))
As @Conjugate Prior says, you seem to be predicting the response with the data used to generate it. You have complete separation as any
ArrDelay < 10
will predictFALSE
and anyArrDelay >= 10
will predictTRUE
. The other warning message tells you that the fitted probabilities for some observations were effectively 0 or 1 and that is a good indicator you have something wrong with the model.The two can warnings go hand in hand. The likelihood function can be quite flat when some $\hat{\beta}_i$ get large, as in your example. If you allow more iterations, the model coefficients will diverge further if you have a separation issue.
Could you explain what exactly do you mean by model convergence here?
By convergence I mean that the parameters being estimated in the model don't change (or only change less than some small tolerance) between iterations. Here the parameters get increasingly large and fitting stops because of the limit on iterations but the parameter estimates changed a lot between the penultimate and the last iterations and as such haven't converged.
You could try to check if Firth's bias reduction works with your dataset. It is a penalized likelihood approach that can be useful for datasets which produce divergences using the standard
glm
package. Sometimes it can be used instead of eliminating that variable which produces complete/almost complete separation.For the formulation of the bias reduction (the $O(n^{-1})$-term in the asymptotic expansion of the bias of the maximum likelihood estimator is removed using classical cumulants expansion as motivating example) please check http://biomet.oxfordjournals.org/content/80/1/27.abstract
Firth's bias reduction is implemented in the R-package
logistf
: http://cran.r-project.org/web/packages/logistf/logistf.pdf
License under CC-BY-SA with attribution
Content dated before 6/26/2020 9:53 AM
conjugateprior 10 years ago
First thought: *Perfect separation*, meaning the predictor is 'too good', the logits go to +/- infinity and everything falls over. Second thought: Does the code do what you think it does? Your variable names don't seem to quite match your description. You might elaborate what the data is more precisely, since it looks like you *might* be trying to predict something with itself.