How to find residuals and plot them

  • I have been given data

    x = c(21,34,6,47,10,49,23,32,12,16,29,49,28,8,57,9,31,10,21,26,31,52,21,8,18,5,18,26,27,26,32,2,59,58,19,14,16,9,23,28,34,70,69,54,39,9,21,54,26)
    y = c(47,76,33,78,62,78,33,64,83,67,61,85,46,53,55,71,59,41,82,56,39,89,31,43,29,55, 
         81,82,82,85,59,74,80,88,29,58,71,60,86,91,72,89,80,84,54,71,75,84,79)
    

    How can I obtain the residuals and plot them versus $x$? And how can I test if the residuals appear to be approximately normal?

    I'm not sure if I do the original linear fit correctly as I got the equation $y=6.9x-5.5$ but the lecture notes says that the linear regression line should be of the form $y_i=\beta_0+\beta_1x+\epsilon$.

    Which package are you using? For example Matlab's 'regress' function returns the residuals as an output and you can graph using a histogram

    I'm using Sagemath. I can also use R via it but I have very little experience of it.

    Concerning the 2 equations that you have up there. If the regression line (as a linear function) is of the form $y = a +k x$ then the linear model is $E[Y|X] = a+ k X$ and using error terms this is $Y = a + k X + \epsilon$ where $\epsilon$ is an error term with zero expectation. This is the sense in which the two equations fit together.

    The equation you got *is* of the form mentioned in your notes, with $\hat{\beta_0} = -5.5$ and $\hat{\beta_1} = 6.9$. The residuals are just $r_i = y_y-\hat{y}_i = y_i - (-5.5 + 6.9 x_i)$

  • Peter Flom

    Peter Flom Correct answer

    8 years ago

    EDIT: You have an R tag but then in a comment say you don't know much about it. This is R code. I know nothing about Sage. End edit

    You can do this

    x = c(21,34,6,47,10,49,23,32,12,16,29,49,28,8,57,9,31,10,21,
          26,31,52,21,8,18,5,18,26,27,26,32,2,59,58,19,14,16,9,23,
          28,34,70,69,54,39,9,21,54,26)
    y = c(47,76,33,78,62,78,33,64,83,67,61,85,46,53,55,71,59,41,82,
          56,39,89,31,43,29,55, 81,82,82,85,59,74,80,88,29,58,71,60,
          86,91,72,89,80,84,54,71,75,84,79)
    
    m1 <- lm(y~x)  #Create a linear model
    resid(m1) #List of residuals
    plot(density(resid(m1))) #A density plot
    qqnorm(resid(m1)) # A quantile normal plot - good for checking normality
    qqline(resid(m1))
    

    +1 @guest, the code above is for R, which is freely available

    Okay. So I saw the picture with caption density.default(x=resid(m1)). Should this code output two graphs? And should I check from the graph is the residuals appear to be approximately normal?

    The code will output two graphs - one is a density plot (does it look bell shaped?) the other is a quantile plot; if the residuals were perfectly normal, the points would all lie on the straight line.

    Right. Code works if you change last lines to plot(qqnorm(resid(m1))) and plot(qqline(resid(m1))). So I think residuals does not satisfy the normal distribution as there are point farther below the line than over the line. Are there any numerical criterion to check the normality?

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM