Mean absolute deviation vs. standard deviation

  • In the text book "New Comprehensive Mathematics for O Level" by Greer (1983), I see averaged deviation calculated like this:

    Sum up absolute differences between single values and the mean. Then get its average. Througout the chapter the term mean deviation is used.

    But I've recently seen several references that use the term standard deviation and this is what they do:

    Calculate squares of differences between single values and the mean. Then get their average and finally the root of the answer.

    I tried both methods on a common set of data and their answers differ. I'm not a statistician. I got confused while trying to teach deviation to my kids.

    So in short, are the terms standard deviation and mean deviation the same or is my old text book wrong?

    The two quantities differ. They weight the data differently. The standard deviation will be larger, and it is relatively more affected by larger values. The standard deviation (most particularly, the n-denominator version) can be thought of as a root-mean-square deviation. Standard deviations are more commonly used.

    Incidentally, one reason that people tend to prefer standard deviation is because variances of sums of unrelated random variables add (and related ones also have a simple formula). That doesn't happen with mean deviation.

    an important point is that the standard deviation derives from a model of squared errors (L2-norm, think about the normal distribution) while the mean of absolute differences corresponds to the L1-norm (think about the symmetrical exponential distribution): it is therefore more adapted (hear: sensitive) to outliers and sparse distirbutions

    Here is a paper on the debate: mean vs. standard deviation

    @Glen_b You wrote "variances of sums of unrelated random variables add" and I am certain I am missing some useful information to your point, because I am likewise certain that I can add any finite number of finite numerical quantities (be they SDs, MADs, or some others). Can you amplify? (Also: happy new year! :)

    @Alexis the phrasing was poor. For independent random variables, Var(X+Y) = Var(X)+Var(Y). This fact is used all over the place (it leads to the familiar $\sqrt{n}\,$ terms when standardizing formulas involving means, like in one-sample t-statistics for example). There's no correspondingly general fact for mean deviation.

  • Kasper

    Kasper Correct answer

    7 years ago

    Both answer how far your values are spread around the mean of the observations.

    An observation that is 1 under the mean is equally "far" from the mean as a value that is 1 above the mean. Hence you should neglect the sign of the deviation. This can be done in two ways:

    • Calculate the absolute value of the deviations and sum these.

    • Square the deviations and sum these squares. Due to the square, you give more weight to high deviations, and hence the sum of these squares will be different from the sum of the means.

    After calculating the "sum of absolute deviations" or the "square root of the sum of squared deviations", you average them to get the "mean deviation" and the "standard deviation" respectively.

    The mean deviation is rarely used.

    So when one simply says 'deviation' do they mean 'standard deviation'?

    I agree that 1 above or below would indicate a meaningful 'change' or 'dispersion' from a common-man's point-of-view. But squaring it would give larger values and that might not be my 'actual change'. Maybe I'm wrong but that's how I see it :/

    Most of the times the term standard deviation (square root of variance) is used. Calculating the squares is typically done, as it facilitates lots of other calculations.

    @itsols Technically, you should always specify which type of deviation statistic you are calculating for the data set -- the word deviation on its own should refer to the deviation of a single datapoint from the mean (in the way Kasper uses it in the answer).

    @itsols, +1 to Amelia. Indeed, nobody says of a dataset _statistic_ as just "deviation". A statistic is "mean absolute deviation" or "root of mean squared deviation" or such.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM