### Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model

Consider the following figure from Faraway's Linear Models with R (2005, p. 59).

The first plot seems to indicate that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed errors. Therefore, the second and third plots, which seem to indicate dependency between the residuals and the fitted values, suggest a different model.

But why does the second plot suggest, as Faraway notes, a heteroscedastic linear model, while the third plot suggest a non-linear model?

The second plot seems to indicate that the absolute value of the residuals is strongly positively correlated with the fitted values, whereas no such trend is evident in the third plot. So if it were the case that, theoretically speaking, in a heteroscedastic linear model with normally distributed errors

$$ \mbox{Cor}\left(\mathbf{e},\hat{\mathbf{y}}\right) = \left[\begin{array}{ccc}1 & \cdots & 1 \\ \vdots & \ddots & \vdots \\ 1 & \cdots & 1\end{array}\right] $$

(where the expression on the left is the variance-covariance matrix between the residuals and the fitted values) this would explain why the second and third plots agree with Faraway's interpretations.

But is this the case? If not, how else can Faraway's interpretations of the second and third plots be justified? Also, why does the third plot necessarily indicate non-linearity? Isn't it possible that it is linear, but that the errors are either not normally distributed, or else that they are normally distributed, but do not center around zero?

@Glen_b: Thanks. I've corrected the paragraph you were referring to by substituting "dependence" for "correlation".

Glen_b -Reinstate Monica Correct answer

7 years agoBelow are those residual plots with the approximate mean and spread of points (limits that include most of the values) at each value of fitted (and hence of $x$) marked in - to a rough approximation indicating the conditional mean (red) and conditional mean $\pm$ (roughly!) twice the conditional standard deviation (purple):

The second plot shows the mean residual doesn't change with the fitted values (and so is doesn't change with $x$), but the spread of the residuals (and hence of the $y$'s about the fitted line) is increasing as the fitted values (or $x$) changes. That is, the spread is not constant. Heteroskedasticity.

the third plot shows that the residuals are mostly negative when the fitted value is small, positive when the fitted value is in the middle and negative when the fitted value is large. That is, the spread is approximately constant, but the conditional mean is not - the fitted line doesn't describe how $y$ behaves as $x$ changes, since the relationship is curved.

Isn't it possible that it is linear, but that the errors are either not normally distributed, or else that they are normally distributed, but do not center around zero?

Not really*, in those situations the plots look different to the third plot.

(i) If the errors were normal but not centered at zero, but at $\theta$, say, then the intercept would pick up the mean error, and so the estimated intercept would be an estimate of $\beta_0+\theta$ (that would be its expected value, but it is estimated with error). Consequently, your residuals would still have conditional mean zero, and so the plot would look like the first plot above.

(ii) If the errors are not normally distributed the pattern of dots might be densest somewhere other than the center line (if the data were skewed), say, but the local mean residual would still be near 0.

Here the purple lines still represent a (very) roughly 95% interval, but it's no longer symmetric. (I'm glossing over a couple of issues to avoid obscuring the basic point here.)

* It's not necessarily

*impossible*-- if you have an "error" term that doesn't really behave like errors - say where $x$ and $y$ are related to them in just the right way - you might be able to produce patterns something like these. However, we make assumptions about the error term, such as that it's not related to $x$, for example, and has zero mean; we'd have to break at least some of those sorts of assumptions to do it. (In many cases you may have reason to conclude that such effects should be absent or at least relatively small.)Let me see if i understand correctly. Does homoscedasticity mean that the spread of the errors does not depend on x (and hence does not depend on $\hat{y}$ either, since $\hat{y}$ is a function of $x$)?

Homoskedasticity literally means "same spread". That is the (population) variance of the response at every data point should be the same. One of the observable ways it might differ from being equal is if it changes with the mean (estimated by fitted); another way is if it changes with some independent variable (though for simple regression there's presumably only one independent variable available in most cases, so the two will be basically the same thing). You could imagine a situation where the mean changes with $x_1$ but the spread changes with $x_2$, which itself is not related to $x_1$.

(ctd) ... that would still be a violation of all observations having the same spread. [I was being a little loose with the distinction between $x$ and the fitted values; I'll try to clean that up.]

Thank you. The situation is much clearer now. I thought that homoscedasticity meant that the variance-covariance matrix of the error has the form $\sigma^2 I$, and so, in particular, if the error vector distributed as $\mbox{N}\left(\mathbf{0},V\right)$ for some arbitrary, symmetric matrix $V$, the model were heteroscedastic. Now i realize this is not the case. But now that i understand the meaning of homoscedasticity, i have another question. Is it possible to tell from Faraway's first plot that the error's variance-covariance matrix has the form $\sigma^2 I$? Could it be some arbitrary $V$?

Homoskedasticity *does* mean that the variance-covariance matrix has the form $\sigma^2I$. That implies all the things I said; the second plot in your question indicates a particular kind of heteroskedasticity (a common one). Heteroskedasticity *does* mean that the error vector is distributed as $N(0,V)$ for some arbitrary, symmetric matrix $V\neq I$. (Well, it's literally about the diagonal of V; if V isn't diagonal you have other issues than heteroskedasticity). The first plot doesn't (and can't) prove that the errors are homoskedastic, ... (ctd)

(ctd) ... as you should be able to see from my first comment under my answer, in particular as a result of the sentence beginning "You could imagine..." -- but it pretty much rules out heteroskedasticity that's related to the mean.

@Glen_b, Sorry for the necropost: I am a bit confused about the quantification whereby we consider the error at each point (xi,yi): Do we consider several responses (xi,y1_1) , (xi, yi_2),..., (xi, yi_m) for the input xi ; i=1,2,...,n (number of data points) and then find the mean and variance for the values yi_j ? I am just confused as to why in a linear regression y=ax+b , x,y,a ( or a multilinear one y+a1x1+a2x2+...anxn then ai, xi ) are random variables and not fixed .values.

The only random variables in the regression relationship are the y's (equivalently, the errors -- and any functions of the y's, like parameter estimates); we condition on the x's and the parameters are fixed population constants.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM

Glen_b -Reinstate Monica 7 years ago

None of the three plots show correlation (at least not linear correlation, which is the relevant meaning of 'correlation' in the sense in which it is being used in "*the residuals and the fitted values are uncorrelated*").