What is the difference between fixed effect, random effect and mixed effect models?

  • In simple terms, how would you explain (perhaps with simple examples) the difference between fixed effect, random effect and mixed effect models?

    I also find that sometimes is difficult to determine when an effect must be considered as fixed or as random effect. Althought there are some recommendations about this fact, not always is easy to take the right decision.

    I think that this link may be helpful in clarifying the underlying principles of mixed models: Fixed, Random, and Mixed Models (SAS documentation).

  • Statistician Andrew Gelman says that the terms 'fixed effect' and 'random effect' have variable meanings depending on who uses them. Perhaps you can pick out which one of the 5 definitions applies to your case. In general it may be better to either look for equations which describe the probability model the authors are using (when reading) or write out the full probability model you want to use (when writing).

    Here we outline five definitions that we have seen:

    1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts $a_i$ and fixed slope $b$ corresponds to parallel lines for different individuals $i$, or the model $y_{it} = a_i + b t$. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients.

    2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth.

    3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random.” (Green and Tukey, 1960)

    4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect.” (LaMotte, 1983)

    5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage (“linear unbiased prediction” in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics.

    [Gelman, 2004, Analysis of variance—why it is more important than ever. The Annals of Statistics.]

    +1: very nice link! I guess the definition also varies depending on the field (e.g. #4 is very mathematical/statistical, but #1 and #2 are more "understandable" from a life science point of view)

    It is also informative to read the Discussion and Rejoinder to this paper. In the discussion, Peter McCullagh wrote that he disagrees with a substantial portion of what Gelman wrote. My point is not to favor one or the other, but to note that there is substantial disagreement among experts and not to put too much weight on one paper.

    Cool, I haven't seen that. Do you have a link to the paper(s) you're talking about?

    The entire discussion is at link

    It is funny that Andrew Gelman is described as a "blogger" rather than as one of the foremost statisticians in the world today. Although he is, of course, a blogger, he probably should be called "Statistician Andrew Gelman" if any qualifier be used.

    But as a statistician and not just a fancy blogger he should've put at least subjective relative frequencies of the five cases usage. When people talk about fixed effects vs random effects they most of the times mean: `(4) “If an effect is assumed to be a realized value of a random variable, it is called a random effect.” (LaMotte, 1983) `

    My impression is that (1) and (5) are by far the most common uses in the social sciences and perhaps some medical fields as well (though I only get the latter impression from reading Gelman's blog on occasion). (4) might be the most defensible use, but especially from Gelman's perspective—he is interested in "applied" statistics a great deal—I would imagine the (4) case does not come up often and it's difficult to say which fields should be most important as we parse frequency of use.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM