### How to find residuals and plot them

• I have been given data

x = c(21,34,6,47,10,49,23,32,12,16,29,49,28,8,57,9,31,10,21,26,31,52,21,8,18,5,18,26,27,26,32,2,59,58,19,14,16,9,23,28,34,70,69,54,39,9,21,54,26)
y = c(47,76,33,78,62,78,33,64,83,67,61,85,46,53,55,71,59,41,82,56,39,89,31,43,29,55,
81,82,82,85,59,74,80,88,29,58,71,60,86,91,72,89,80,84,54,71,75,84,79)


How can I obtain the residuals and plot them versus $x$? And how can I test if the residuals appear to be approximately normal?

I'm not sure if I do the original linear fit correctly as I got the equation $y=6.9x-5.5$ but the lecture notes says that the linear regression line should be of the form $y_i=\beta_0+\beta_1x+\epsilon$.

Which package are you using? For example Matlab's 'regress' function returns the residuals as an output and you can graph using a histogram

I'm using Sagemath. I can also use R via it but I have very little experience of it.

Concerning the 2 equations that you have up there. If the regression line (as a linear function) is of the form $y = a +k x$ then the linear model is $E[Y|X] = a+ k X$ and using error terms this is $Y = a + k X + \epsilon$ where $\epsilon$ is an error term with zero expectation. This is the sense in which the two equations fit together.

The equation you got *is* of the form mentioned in your notes, with $\hat{\beta_0} = -5.5$ and $\hat{\beta_1} = 6.9$. The residuals are just $r_i = y_y-\hat{y}_i = y_i - (-5.5 + 6.9 x_i)$

8 years ago

EDIT: You have an R tag but then in a comment say you don't know much about it. This is R code. I know nothing about Sage. End edit

You can do this

x = c(21,34,6,47,10,49,23,32,12,16,29,49,28,8,57,9,31,10,21,
26,31,52,21,8,18,5,18,26,27,26,32,2,59,58,19,14,16,9,23,
28,34,70,69,54,39,9,21,54,26)
y = c(47,76,33,78,62,78,33,64,83,67,61,85,46,53,55,71,59,41,82,
56,39,89,31,43,29,55, 81,82,82,85,59,74,80,88,29,58,71,60,
86,91,72,89,80,84,54,71,75,84,79)

m1 <- lm(y~x)  #Create a linear model
resid(m1) #List of residuals
plot(density(resid(m1))) #A density plot
qqnorm(resid(m1)) # A quantile normal plot - good for checking normality
qqline(resid(m1))


+1 @guest, the code above is for R, which is freely available

Okay. So I saw the picture with caption density.default(x=resid(m1)). Should this code output two graphs? And should I check from the graph is the residuals appear to be approximately normal?

The code will output two graphs - one is a density plot (does it look bell shaped?) the other is a quantile plot; if the residuals were perfectly normal, the points would all lie on the straight line.

Right. Code works if you change last lines to plot(qqnorm(resid(m1))) and plot(qqline(resid(m1))). So I think residuals does not satisfy the normal distribution as there are point farther below the line than over the line. Are there any numerical criterion to check the normality?