### Real life examples of distributions with negative skewness

• Inspired by "real-life examples of common distributions", I wonder what pedagogical examples people use to demonstrate negative skewness? There are many "canonical" examples of symmetric or normal distributions used in teaching - even if ones like height and weight don't survive closer biological scrutiny! Blood pressure might be nearer normality. I like astronomical measurement errors - of historic interest, they are intuitively no more likely to lie in one direction than another, with small errors more likely than large.

Common pedagogical examples for positive skewness include people's incomes; mileage on used cars for sale; reaction times in a psychology experiment; house prices; number of accident claims by an insurance customer; number of children in a family. Their physical reasonableness often stems from being bounded below (usually by zero), with low values being plausible, even common, yet very large (sometimes orders of magnitude higher) values are well-known to occur.

For negative skew, I find it harder to give unambiguous and vivid examples that a younger audience (high schoolers) can intuitively grasp, perhaps because fewer real-life distributions have a clear upper bound. A bad-taste example I was taught at school was "number of fingers". Most folk have ten, but some lose one or more in accidents. The upshot was "99% of people have a higher-than-average number of fingers"! Polydactyly complicates the issue, as ten is not a strict upper bound; since both missing and extra fingers are rare events, it may be unclear to students which effect predominates.

I usually use a binomial distribution with high \$p\$. But students often find "number of satisfactory components in a batch is negatively skewed" less intuitive than the complementary fact that "number of faulty components in a batch is positively skewed". (The textbook is industrially themed; I prefer cracked and intact eggs in a box of twelve.) Maybe students feel that "success" should be rare.

Another option is to point out that if \$X\$ is positively skewed then \$-X\$ is negatively skewed, but to place this in a practical context ("negative house prices are negatively skewed") seems doomed to pedagogical failure. While there are benefits to teaching the effects of data transformations, it seems wise to give a concrete example first. I would prefer one that does not seem artificial, where the negative skew is quite unambiguous, and for which students' life-experience should give them an awareness of the shape of the distribution.

It is not apparent that negating a variable will be a "pedagogical failure," because there is the option of adding a constant without changing the shape of the distribution. Many skewed distributions involve proportions \$X\$ for instance, and the complementary proportions \$1-X\$ are usually just as natural and easy to interpret as the original proportions. Even with house prices \$X\$ the values \$C-X\$ where \$C\$ is a maximum house price in the area could be of interest and is not difficult to understand. Also consider using logs and negative power transformations to create negative skew.

@whuber Yes, that's true about translating the scale - essentially the same as my switching from a low \$p\$ to complementary (high) \$p\$ binomial. Is there any intuitive reason to be interested in \$C-X\$ for house prices though, to motivate the example pedagogically? The best I can think of is "You've win \$C\$ million in the lottery, how much money would you have left after buying a random house?" It seems easier to motivate if \$C\$ is a "natural" maximum, as in the binomial example.

I agree that \$C-X\$ in the case of house prices would be a little contrived. But \$1/X\$ would not: it would be "amount of house you can buy per dollar." I suspect that in any reasonably homogeneous area this would have a strong negative skew. Such examples could teach the deeper lesson that skewness is a function of how we express the data.

@whuber It wouldn't be contrived at all. Maximum and minimum _potential_ prices in a market arise naturally as those reflecting different evaluations by market participants. Among the buyers, there is conceivably one that would pay maximum price for a given house. And among the sellers there is one that would conceivably accept minimum price. But this information is not public and so actual observed transaction prices are affected by the existence of incomplete information. (CONT'D)

CONT'D ... The following paper by Kumbhakar and Parmeter (2010) models exactly that (permitting also the case of symmetry), and with an application on the house market:http://link.springer.com/article/10.1007/s00181-009-0292-8#page-1

Age at death is negatively skewed in developed countries.

@Nick Cox that's an excellent example, care to add it as an answer?

I will if I find some data. At the moment I don't have much to say beyond one line, as above. Anyone is welcome to pick it up and write something longer.

@NickCox I've now added, thanks http://stats.stackexchange.com/a/122853/22228

4 years ago

In the UK, price of a book. There is a "Recommended retail price" which will generally be the modal price, and virtually nowhere would you have to pay more. But some shops will discount, and a few will discount heavily.

Also, age at retirement. Most people retire at 65-68 which is when the state pension kicks in, very few people work longer, but some people retire in their 50s and quite a lot in their early 60s.

Then too, the number of GCSEs people get. Most kids are entered for 8-10 and so get 8-10. A small number do more. Some of the kids don't pass all their exams though, so there is a steady increase from 0 to 7.

This perhaps needs an explanation that GCSE is an exam in British secondary schools and some related systems, most commonly taken at age about 16. The number is of subjects taken, e.g. Mathematics is commonly one subject.