How to interpret the coefficient of variation?

  • I am trying to understand the Coefficient of Variation. When I try to apply it to the following two samples of data I am unable to understand how to interpret the results.

    Let's say sample 1 is ${0, 5, 7, 12, 11, 17}$ and sample 2 is ${10 ,15 ,17 ,22 ,21 ,27}$. Here sample 2 $=$ sample 1 $+\ 10$ as you can see.

    Both have the same standard deviation $\sigma_{2} = \sigma_{1}= 5.95539$ but $\mu_{2}=18.67$ and $\mu_{1}=8.66667$.

    Now the coefficient of variation ${\sigma}/{\mu}$ will be different. For sample 2 it will be less than for sample 1. But how do I interpret that result? In terms of variance both are the same; only their means are different. So what's the use of the coefficient of variation here? It's just misleading me, or maybe I am unable to interpret the results.

    If instead of adding 10 you add 1000 the second set of numbers will differ by much less, relative to the mean, than the first set. The coefficient of variation is an expression of this.

  • Nick Cox

    Nick Cox Correct answer

    6 years ago

    In examples like yours when data differ just additively, i.e. we add some constant $k$ to everything, then as you point out the standard deviation is unchanged, the mean is changed by exactly that constant, and so the coefficient of variation changes from $\sigma / \mu$ to $\sigma / (\mu + k)$, which is neither interesting nor useful.

    It's multiplicative change that's interesting and where the coefficient of variation has some use. For multiplying everything by some constant $k$ implies that the coefficient of variation becomes $k \sigma/k \mu$, i.e. remains the same as before. Changing of units of measurement is a case in point, as in the answers of @Aksalal and @Macond.

    As the coefficient of variation is unit-free, so also it is dimension-free, as whatever units or dimensions are possessed by the underlying variable are washed out by the division. That makes the coefficient of variation a measure of relative variability, so the relative variability of lengths may be compared with that of weights, and so forth. One field where the coefficient of variation has found some descriptive use is the morphometrics of organism size in biology.

    In principle and practice the coefficient of variation is only defined fully and at all useful for variables that are entirely positive. Hence in detail your first sample with a value of $0$ is not an appropriate example. Another way of seeing this is to note that were the mean ever zero the coefficient would be indeterminate and were the mean ever negative the coefficient would be negative, assuming in the latter case that the standard deviation is positive. Either case would make the measure useless as a measure of relative variability, or indeed for any other purpose.

    An equivalent statement is that the coefficient of variation is interesting and useful only if logarithms are defined in the usual way for all values, and indeed using coefficients of variation is equivalent to looking at variability of logarithms.

    Although it should seem incredible to readers here, I have seen climatological and geographical publications in which the coefficients of variation of Celsius temperatures have puzzled naive scientists who note that coefficients can explode as mean temperatures get close to $0^\circ$C and become negative for mean temperatures below freezing. Even more bizarrely, I have seen suggestions that the problem is solved by using Fahrenheit instead. Conversely, the coefficient of variation is often mentioned correctly as a summary measure defined if and only if measurement scales qualify as ratio scale. As it happens, the coefficient of variation is not especially useful even for temperatures measured in kelvin, but for physical reasons rather than mathematical or statistical.

    As in the case of the bizarre examples from climatology, which I leave unreferenced as the authors deserve neither the credit nor the shame, the coefficient of variation has been over-used in some fields. There is occasionally a tendency to regard it as a kind of magic summary measure that encapsulates both mean and standard deviation. This is naturally primitive thinking, as even when the ratio makes sense, the mean and standard deviation cannot be recovered from it.

    In statistics the coefficient of variation is a fairly natural parameter if variation follows either the gamma or the lognormal, as may be seen by looking at the form of the coefficient of variation for those distributions.

    Although the coefficient of variation can be of some use, in cases where it applies the more useful step is to work on logarithmic scale, either by logarithmic transformation or by using a logarithmic link function in a generalized linear model.

    EDIT: If all values are negative, then we can regard the sign as just a convention that can be ignored. Equivalently in that case, $\sigma / |\mu|$ is effectively an identical twin of coefficient of variation.

    EDIT 25 May 2020: Good detailed discussion in Simpson, G.G., Roe, A. and Lewontin, R.C. 1960. Quantitative Zoology. New York: Harcourt, Brace, pp.89-94. This text is inevitably dated in several respects, but includes many lucid explanations and pugnacious comments and criticisms.

    See also Lewontin, R.C. 1966. On the measurement of relative variability. Systematic Biology 15: 141–142.

    +1 This post includes the key points about logarithms and positivity which ought to be a part of any discussion of the issue. The "war stories" make it a good read, too.

    I thought u couldn't calculate CV if a variable is = 0?

    @Jerf: Think it through. If all values are 0, then there is no variation and nothing to calculate. There is no problem just because some individual values are 0, as that itself does not rule out the mean being 0. Yet you can always find examples where some values are not zero yet the mean is 0, e.g. -1, 0, 1 in which case the CV is indeterminate. But in practice, the CV is most useful when all values are positive.

  • Imagine I said "There are 1,625,330 people in this town. Plus or minus five." You'd be impressed by my accurate demographic knowledge.

    But if I said "There are five people in this house. Plus or minus five." You'd think I had no clue how many people were in the house.

    Same standard deviation, much different CV's.

    This is a reasonable way to explain what the CoV is, but it isn't clear how relevant it is to the OP's question.

    OP asks: "In terms of variance both are the same; only their means are different. So what's the use of the coefficient of variation here?" I think my example illustrates the use of the CV as a way of interpreting the variance.

    I didn't downvote you. The OP's 2 explicit questions are: "how do I interpret that result?", & "what's the use of the coefficient of variation here?". You explanation is good, but understanding what the CoV is, is only the first step in answering those questions, not the whole of the answer to those questions.

  • Normally, you use coefficient of variation for variable of different units of measure or very different scales. You can think of it as noise/signal ratio. For instance, you may want to compare variability of the weight and height of students; variability of GDP of USA and Monaco.

    In your case, coefficient of variation may not make much sense at all, since the values are not much different.

  • Sample with higher values has less variation relative to its mean, as the definition ($s / \bar{x} $) suggests. It is actually pretty straight-forward. Coefficient of variation is useful when comparing variation between samples (or populations) of different scales. Consider you are dealing with wages among countries. Comparing variation in wages in US and Japan is less informative if you use variance instead of coefficient of variation as your statistic, because 1 USD ~= 100 JPY and a 1 unit difference in wages doesn't mean same thing in both samples. Well, in this example you can convert everything to USD and then do the calculations, but it is not always obvious how to convert between different scales. When comparing variation in body weights of different species for instance.

  • In actuality, both statistics can be misleading if you do not know or understand your hypothesis and experiment. Consider this gruesome example... Walking across two high rise buildings on a tightrope as opposed to walking on a plank. Let's say that the tightrope has a 1 inch diameter, whereas the plank is 12 inches wide. 5 people were asked to walk the rope and 5 were asked to walk the plank. We found the following results:

    The average distance of each step from the edge (or side) of the rope (inches): 0.5, 0.2, 0.3, 0.6, 0.1

    The average distance of each step from the edge (or side) of the plank (inches): 5.5, 5.2, 5.3, 5.6, 5.1

    Just as in your example, this example will results in equal standard deviations as the values for the plank are simply a +5 difference to those for the tightrope. However, if I told you that the standard deviation for each experiment was 0.2074 you might say well then the two experiments were equivalent. However, if I told you that the CV for the tightrope experiment was almost 61% compared to under 4% for the plank, you might be inclined to ask me how many people fell off of the rope.

  • CV is a relative variability that is used to compare the variability of different sample dataset. For a you example, the same standard deviation/variance with smaller mean will generate a smaller CV. it indicates that smaller CV dataset has smaller relative variability. Assume You earn 10000 monthly, and I earn 100.(different mean) we all probably loss 100 monthly (vriation), I will be hurt far more than you since I get a bigger CV(cv=1 compared to yours 0.01), relative greater variation.

    I have to say that this doesn't add anything to existing answers.

  • in this case, cv is not the right statistical tool to explain the result.

    depending on the nature of the research carried out hence the objective, researcher has a specific hypothesis or point to proof. He or she must design, execute experiment and analyse data using the best and appropriate statistical tool i.e. if the experiment is to compare growth of group 1 and group 2, although cv of both are the same, but using T-test or paired T-test or Anova (bigger experiment) it could easily prove the different between the two group.

    The key here is to apply the appropriate statistical tool to give a meaningful explanation about the result. Remember cv is just one of the choices in Descriptive statistic.

    my 2 cents

    This seems to say that the coefficient of variation is appropriate when it is appropriate, but not otherwise. What different point are you making?

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM