### How would you explain covariance to someone who understands only the mean?

• ...assuming that I'm able to augment their knowledge about variance in an intuitive fashion ( Understanding "variance" intuitively ) or by saying: It's the average distance of the data values from the 'mean' - and since variance is in square units, we take the square root to keep the units same and that is called standard deviation.

Let's assume this much is articulated and (hopefully) understood by the 'receiver'. Now what is covariance and how would one explain it in simple English without the use of any mathematical terms/formulae? (I.e., intuitive explanation. ;)

Please note: I do know the formulae and the math behind the concept. I want to be able to 'explain' the same in an easy to understand fashion, without including the math; i.e., what does 'covariance' even mean?

@Xi'an - 'how' exactly would you define it *via simple linear regression*? I'd really like to know...

Assuming you already have a scatterplot of your two variables, x *vs.* y, with origin at (0,0), simply draw two lines at x=mean(x) (vertical) and y=mean(x) (horizontal): using this new system of coordinates (origin is at (mean(x),mean(y)), put a "+" sign in the top-right and bottom-left quadrants, a "-" sign in the two other quadrants; you got the sign of the covariance, which is basically what @Peter said. Scaling the x- and y-units (by SD) lead to a more interpretable summary, as discussed in the ensuing thread.

@chl - could you please post that as an answer and maybe use graphics to depict it!

I found the video on this website to help me as I prefer images over abstract explanations. Website with video Specifically this image: ![enter image description here](http://i.stack.imgur.com/xGZFv.png)

9 years ago

Sometimes we can "augment knowledge" with an unusual or different approach. I would like this reply to be accessible to kindergartners and also have some fun, so everybody get out your crayons!

Given paired $$(x,y)$$ data, draw their scatterplot. (The younger students may need a teacher to produce this for them. :-) Each pair of points $$(x_i,y_i)$$, $$(x_j,y_j)$$ in that plot determines a rectangle: it's the smallest rectangle, whose sides are parallel to the axes, containing those points. Thus the points are either at the upper right and lower left corners (a "positive" relationship) or they are at the upper left and lower right corners (a "negative" relationship).

Draw all possible such rectangles. Color them transparently, making the positive rectangles red (say) and the negative rectangles "anti-red" (blue). In this fashion, wherever rectangles overlap, their colors are either enhanced when they are the same (blue and blue or red and red) or cancel out when they are different.

(In this illustration of a positive (red) and negative (blue) rectangle, the overlap ought to be white; unfortunately, this software does not have a true "anti-red" color. The overlap is gray, so it will darken the plot, but on the whole the net amount of red is correct.)

Now we're ready for the explanation of covariance.

The covariance is the net amount of red in the plot (treating blue as negative values).

Here are some examples with 32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest).

They are drawn on common axes to make them comparable. The rectangles are lightly outlined to help you see them. This is an updated (2019) version of the original: it uses software that properly cancels the red and cyan colors in overlapping rectangles.

Let's deduce some properties of covariance. Understanding of these properties will be accessible to anyone who has actually drawn a few of the rectangles. :-)

• Bilinearity. Because the amount of red depends on the size of the plot, covariance is directly proportional to the scale on the x-axis and to the scale on the y-axis.

• Correlation. Covariance increases as the points approximate an upward sloping line and decreases as the points approximate a downward sloping line. This is because in the former case most of the rectangles are positive and in the latter case, most are negative.

• Relationship to linear associations. Because non-linear associations can create mixtures of positive and negative rectangles, they lead to unpredictable (and not very useful) covariances. Linear associations can be fully interpreted by means of the preceding two characterizations.

• Sensitivity to outliers. A geometric outlier (one point standing away from the mass) will create many large rectangles in association with all the other points. It alone can create a net positive or negative amount of red in the overall picture.

Incidentally, this definition of covariance differs from the usual one only by a universal constant of proportionality (independent of the data set size). The mathematically inclined will have no trouble performing the algebraic demonstration that the formula given here is always twice the usual covariance.

Now if only all introductory statistical concepts could be presented to students in this lucid manner …

@whuber: You should stop editing posts and start posting answers ;) Simply stunning!

This is beautiful. And very very clear.

I will do something similar if/when I next teach an introductory statistics class. I wish somebody had done so for me when I was first learning statistics!

Having done the algebra, I wonder if "universal constant of proportionality (independent of the data set size)" may be misleading, so I want to check if I understood the procedure correctly. For {(0,0),(1,1),(2,2)} there are $3\choose{2}$ = 3 possible rectangles of areas 1, 1 and 4. They're all red so the "covariance" is 6. And {(0,0),(1,1),(1,1),(2,2)} has $4\choose{2}$ = 6 rectangles, all red or zero, of areas 0, 1, 1, 1, 1 and 4 so "covariance" is 8. Is this right? If so it's $\sum_{i @Silverfish Yes, I should have indicated that the constant was universal after *averaging* the values rather than *summing* them. Thanks, this as I suspected. I realised that an extra factor of 2 comes out if the sum is taken over all$i, j$rather than$i

Do we know if people that invented the concept of covariance and correlation (Pearson i think) had this view in mind ?

Yeah, I do not know anything about stats. I don't understand how the points on the corners are assigned: it's the smallest rectangle, whose sides are parallel to the axes, containing those points. Aren't all sides parallel?

@Tjorriemorrie This isn't about statistics, it's about geometry: You can construct plenty of rectangles whose sides are *not* parallel to the coordinate axes.

Coming from a programming background, I still have no idea :S

I think a reason as to ***what*** the co-variance is used for in applications would not hurt the mind of many.

@Karl Its relationship to linear associations and correlation coefficients (see the bullet points at the end) ought to be enough!

@whuber: this is a genious explanation for covariance ! I must give (+1) (I would like to give more). One question, why do you draw rectangles based on the points $(x_i, y_i)$ and $(x_j, y_j)$ and not on $(x_i, y_i)$ and $(\bar{x}, \bar{y})$ ?

@fcoppens Indeed, there is a traditional explanation that proceeds as you suggest. I thought of this one because I did not want to introduce an idea that is unnecessary--namely, constructing the centroid $(\bar x, \bar y)$. That would make the explanation inaccessible to the five-year-old with a box of crayons. Some of the conclusions I drew at the end would not be immediate, either. For example, it would no longer be quite so obvious that the covariance is sensitive to certain kinds of outliers.

I couldn't get the sciguides link to load. That may just be a problem with me but you might like to double-check it at some convenient time.

@Glen_b Thanks--it doesn't work for me, either. I'll delete the reference. For the record, here's the deleted text: *(The original version of this post has led to the creation of a simplified graphical rendition of the underlying idea. It is accompanied by an admirably clear, step-by-step explanation. Please check it out at http://sciguides.com/guides/covariance/. For additional explanation also see the answer posted here by arthur.00.)*

Hi @whuber, I was looking at your graph a year ago, looking to decipher something highly complex , or at least presumably very complex because I wanted to , and now going back to your reading, it is absolutely simple. Many many thanks for such a pedagogical explanation. One thing though however and this something I've noted a lot in a lot of math/stats explanation: Tons of teachers are not mentionning that sometimes it is up to the do-er to choose in a arbitral manner the points and to be honest with you, I had to fight a bit to get that. But once done, 1/2

@whuber your explanation and the implicit deductions just made perfect sense. Thanks again 2/2

It seems many people find this a simple explanation, unfortunately I failed to understand covariance by reading this at first. After reading about covariance in other places, such as this Quora's question, then I was able to read this again and understand better the rectangles example. I personally think explaining *covariance properties* such as **Bilinearity** and **Relationship to linear associations** is not intuitive (as OP asked), at least to me (maybe I'm way too mediocre). Still appreciatte the effort +1

You don't explain what you mean by "negative" and "positive". Kindergartners would not have understood this.

@nbro I appreciate your comment may be tongue-in-cheek. You might be surprised, however, if you were to consult with a kindergartner concerning these concepts: they know them or can readily be taught them.

I'm with @Alisson and Karl Morrison, I'm lost. Super thanks for all the time you put into making the answer but I didn't follow at all. It's full of math jargon like *bilinearity* and *Relationship to linear associations*. I still have no idea what covariance is. Is it related to points or lines? Are we pulling pairs of points or all possible pairs? If the points are A,B,C,D are we doing (A,B) and (C,D) or are we doing (A,B),(A,C),(A,D),(B,C),(B,D),(C,D)? I guess I first need to be a math wiz, then I'll understand this.

Thanks. Are both the x_i, y_i random (binormal) distributed or is x_i a given , a mathematical ( non-random) variable?

Thanks for the explanation, i still need one clarification. If the x,y pairs are (-1,-1), (0,0), and (1,1) - there will be in total 3 rectangles. 1 red and 1 non-red. And the 3rd rectangles between connecting (-1,-1) and (1,1) - is it positive or negative ? Or will it be partly positive and partly negative ?

@Praveen Yes, those three points determine three (nondegenerate) rectangles--and *all* are red.

I am unable to prove the equivalence mentioned in the last statement... Could you please perhaps give a hint or a sketch of the proof?

@FreezingFire A reasonable request. I have included a demonstration in another answer at https://stats.stackexchange.com/a/222091/919.