How would you explain the difference between correlation and covariance?

  • Following up on this question, How would you explain covariance to someone who understands only the mean?, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind.

    How would one explain to a statistics neophyte the difference between covariance and correlation? It seems that both refer to the change in one variable linked back to another variable.

    Similar to the referred-to question, a lack of formulae would be preferable.

  • Nick Sabbe

    Nick Sabbe Correct answer

    9 years ago

    The problem with covariances is that they are hard to compare: when you calculate the covariance of a set of heights and weights, as expressed in (respectively) meters and kilograms, you will get a different covariance from when you do it in other units (which already gives a problem for people doing the same thing with or without the metric system!), but also, it will be hard to tell if (e.g.) height and weight 'covary more' than, say the length of your toes and fingers, simply because the 'scale' the covariance is calculated on is different.

    The solution to this is to 'normalize' the covariance: you divide the covariance by something that represents the diversity and scale in both the covariates, and end up with a value that is assured to be between -1 and 1: the correlation. Whatever unit your original variables were in, you will always get the same result, and this will also ensure that you can, to a certain degree, compare whether two variables 'correlate' more than two others, simply by comparing their correlation.

    Note: the above assumes that the reader already understands the concept of covariance.

    +1 Did you mean to write "correlation" instead of "covariance" in the last sentence?

    Are you sure you can't compare covariances with different units? The units pass through covariance multiplied - if your X is in `cm`, and your Y is in `s`, then your $cov(X,Y)=z\ cm\cdot s$. And then you can just multiply by the result by the unit conversion factor. Try it in R: `cov(cars$speed,cars$dist) == cov(cars$speed/5,cars$dist/7)*(7*5)`

    @naught101 I suspect the point is that, if I told you that $\mbox{Cov}(X, Y) = 10^10$ and nothing else, you would have no clue whether $X$ is highly predictive of $Y$ or not, whereas if I told that you $\mbox{Cor}(X, Y) = .9$ you would have something a little more interpretable.

    @guy: That would be covariances *without* units :P I think the important thing is that you can't easily compare covariances from two data sets that have different variances. For example, if you have the relation B=2*A, and two datasets, {A1, B1} and {A2, B2}, where A1 has a variance of 0.5 and A2 has a variance of 2, then the $cov(A2, B2)$ will be much larger than $cov(A1, B1)$, even though the relationship is exactly the same.

    So in simple terms corelation > covariance

    So correlation is normalized covariance?

    What is use case of covariance?

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM