What do you call an average that does not include outliers?

  • What do you call an average that does not include outliers?

    For example if you have a set:

    {90,89,92,91,5} avg = 73.4
    

    but excluding the outlier (5) we have

    {90,89,92,91(,5)} avg = 90.5
    

    How do you describe this average in statistics?

    https://sciencing.com/calculate-outliers-5201412.html I felt the above link surely has answered the question.

    This depends how the assumed outliers are defined. It could be a trimmed mean or a Winsorized mean or some other form of robust estimate of location.

    When I saw the title of this question, I was hoping to find a punchline here....

  • dsimcha

    dsimcha Correct answer

    12 years ago

    It's called the trimmed mean. Basically what you do is compute the mean of the middle 80% of your data, ignoring the top and bottom 10%. Of course, these numbers can vary, but that's the general idea.

    Using a rule like "biggest 10%" doesn't make sense. What if there are no outliers? The 10% rule would eliminate some data anyway. Unacceptable.

    See my answer for a statistically-significant way to decide which data qualify as an "outlier."

    Well, there's no rigorous definition of outlier. As for your response, if there are outliers they will affect your estimate of the standard deviation. Furthermore, standard deviation can be a bad measure of dispersion for non-normally distributed data.

    True there's no rigorous definition, but eliminating based on percentile is certainly wrong in many common cases, including the example given in the question.

    Also, outliers will not affect standard deviation much. Unless there are many of them, in which case they aren't outliers! You might for example have a bi-modal or linearly random distribution, but then throwing out data is wrong, and indeed the notion of "average" might be wrong.

    The trimmed mean also enjoys the benefit of including the median as a limiting case, ie, when you trim 50% of data on both sides.

    **This answer is incorrect:** since only one (low) value was discarded, the result has not been "trimmed," which by definition removes equal numbers of values at both ends of the data distribution.

    @whuber Not so. The literature certainly includes trimmed means where the proportions are unequal in each tail, including the case of zero in one tail. Examples are prominent in http://onlinelibrary.wiley.com/book/10.1002/9781118165485 It is a reasonable convention to use equal proportions (a) wherever distributions are approximately symmetric (b) in the absence of a rationale for doing otherwise, but that is not the only possible definition of a trimmed mean. Clearly analysis and interpretation need to account for any differences in proportions.

    @Nick Thank you for the clarification. I would go further, though, and suggest that unless that one "outlier" was excluded due to considerations that (a) were independent of the observed distribution of the data and (b) *a priori* suggested 20% trimming of the low end, then it would be misleading to characterize the process in the question as a "trimming" procedure. It looks like outlier detection and rejection, pure and simple. Although the *result* may look the same, as *statistical procedures* the two processes of trimming and outlier removal are very different.

    @whuber I agree; personally I wouldn't use _trimming_ to describe what is in effect an outlier removal approach based on some other criterion, including visceral guesses. But the distinction is in the mind of the beholder: there is a difference between "for data like this, trimming 5% in each tail seems a good idea" and "I've looked at the data and the top 5% are probably best ignored", etc. The formulas don't know the analyst's attitudes, but the latter are the researcher's justification for what is done.

    The trimming here was one-sided. If you would trim from both the top and bottom, you would remove 92 also cutting out 40% of the distribution.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM