R - QQPlot: how to see whether data are normally distributed

  • I have plotted this after I did a Shapiro-Wilk normality test. The test showed that it is likely that the population is normally distributed. However, how to see this "behaviour" on this plot? enter image description here

    UPDATE

    A simple histogram of the data:

    enter image description here

    UPDATE

    The Shapiro-Wilk test says:

    enter image description here

    Re the edit: the SW test result **rejects** the hypothesis that these data were independently drawn from a common normal distribution: the p-value is very small. (This is apparent both in the qq plot, which exhibits a short left tail, and in the histogram, which exhibits positive skewness.) This suggests you misinterpreted the test. When you interpret the test correctly, do you still have a question to ask?

    So to understand that clearly. I have two assumptions which go into different directions. The SW says no and the plots say yes. So whats the real solution of my test??

    On the contrary: the software and all the plots are consistent in what they say. The qq plot and the histogram show specific ways in which the data deviate from normality; the SW test says that such data are unlikely to have come from a normal distribution.

    Why does the plots say that its not normaly distributed? The qqplot creates a straight line and the histogram looks also normaly distributed? I do not get it;(

    The qq plot clearly is *not* straight and the histogram clearly is *not* symmetric (which is perhaps the most basic of the many criteria a normally distributed histogram must satisfy). Sven Hohenstein's answer explains how to read the qq plot.

    You might find it helpful to generate a normal vector of the same size and create a QQ-plot with the normal data to see how it might appear when the data, in fact, comes from a normal distribution.

  • "The test showed that it is likely that the population is normally distributed."

    No; it didn't show that.

    Hypothesis tests don't tell you how likely the null is. In fact you can bet this null is false.

    The Q-Q plot doesn't give a strong indication of non-normality (the plot is fairly straight); there's perhaps a slightly shorter left tail than you'd expect but that really won't matter much.

    The histogram as-is probably doesn't say a lot either; it does also hint at a slightly shorter left tail. But see here

    The population distribution your data are from isn't going to be exactly normal. However, the Q-Q plot shows that normality is probably a reasonably good approximation.

    If the sample size was not too small, a lack of rejection of the Shapiro-Wilk would probably be saying much the same.

    Update: your edit to include the actual Shapiro-Wilk p-value is important because in fact that would indicate you would reject the null at typical significant levels. That test indicates your data are not normally distributed and the mild skewness indicated by the plots is probably what is being picked up by the test. For typical procedures that might assume normality of the variable itself (the one-sample t-test is one that comes to mind), at what appears to be a fairly large sample size, this mild non-normality will be of almost no consequence at all -- one of the problems with goodness of fit tests is they're more likely to reject just when it doesn't matter (when the sample size is large enough to detect some modest non-normality); similarly they're more likely to fail to reject when it matters most (when the sample size is small).

    In fact, this made me misread the OP's statement: I thought he said unlikely. Note that I slightly disagree with you: while a test normally tells you how unlikely an observation would be if the null hypothesis were true, we use this to argue that since we _did_ get this observation, the null hypothesis is unlikely to be true.

    Thx for your answer! I am a little bit confused by all the statements which go into the other direction. To say it clearly, my excercise is it to make a statement about the normality of the sample. So what would you suggest to say as an answer to my professor? And how to show normality even the sample size is huge?;S

    About the strongest you could say would be something like - "The Q-Q plot is reasonably consistent with normality, but the left tail is a little 'short'; there's mild indication of skewness."

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM