Does the variance of a sum equal the sum of the variances?

  • Is it (always) true that $$\mathrm{Var}\left(\sum\limits_{i=1}^m{X_i}\right) = \sum\limits_{i=1}^m{\mathrm{Var}(X_i)} \>?$$

    The answers below provide the proof. The intuition can be seen in the simple case var(x+y): if x and y are positively correlated, both will tend to be large/small together, increasing total variation. If they are negatively correlated, they will tend to cancel each other, decreasing total variation.

  • Macro

    Macro Correct answer

    8 years ago

    The answer to your question is "Sometimes, but not in general".

    To see this let $X_1, ..., X_n$ be random variables (with finite variances). Then,

    $$ {\rm var} \left( \sum_{i=1}^{n} X_i \right) = E \left( \left[ \sum_{i=1}^{n} X_i \right]^2 \right) - \left[ E\left( \sum_{i=1}^{n} X_i \right) \right]^2$$

    Now note that $(\sum_{i=1}^{n} a_i)^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} a_i a_j $, which is clear if you think about what you're doing when you calculate $(a_1+...+a_n) \cdot (a_1+...+a_n)$ by hand. Therefore,

    $$ E \left( \left[ \sum_{i=1}^{n} X_i \right]^2 \right) = E \left( \sum_{i=1}^{n} \sum_{j=1}^{n} X_i X_j \right) = \sum_{i=1}^{n} \sum_{j=1}^{n} E(X_i X_j) $$

    similarly,

    $$ \left[ E\left( \sum_{i=1}^{n} X_i \right) \right]^2 = \left[ \sum_{i=1}^{n} E(X_i) \right]^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} E(X_i) E(X_j)$$

    so

    $$ {\rm var} \left( \sum_{i=1}^{n} X_i \right) = \sum_{i=1}^{n} \sum_{j=1}^{n} \big( E(X_i X_j)-E(X_i) E(X_j) \big) = \sum_{i=1}^{n} \sum_{j=1}^{n} {\rm cov}(X_i, X_j)$$

    by the definition of covariance.

    Now regarding Does the variance of a sum equal the sum of the variances?:

    • If the variables are uncorrelated, yes: that is, ${\rm cov}(X_i,X_j)=0$ for $i\neq j$, then $$ {\rm var} \left( \sum_{i=1}^{n} X_i \right) = \sum_{i=1}^{n} \sum_{j=1}^{n} {\rm cov}(X_i, X_j) = \sum_{i=1}^{n} {\rm cov}(X_i, X_i) = \sum_{i=1}^{n} {\rm var}(X_i) $$

    • If the variables are correlated, no, not in general: For example, suppose $X_1, X_2$ are two random variables each with variance $\sigma^2$ and ${\rm cov}(X_1,X_2)=\rho$ where $0 < \rho <\sigma^2$. Then ${\rm var}(X_1 + X_2) = 2(\sigma^2 + \rho) \neq 2\sigma^2$, so the identity fails.

    • but it is possible for certain examples: Suppose $X_1, X_2, X_3$ have covariance matrix $$ \left( \begin{array}{ccc} 1 & 0.4 &-0.6 \\ 0.4 & 1 & 0.2 \\ -0.6 & 0.2 & 1 \\ \end{array} \right) $$ then ${\rm var}(X_1+X_2+X_3) = 3 = {\rm var}(X_1) + {\rm var}(X_2) + {\rm var}(X_3)$

    Therefore if the variables are uncorrelated then the variance of the sum is the sum of the variances, but converse is not true in general.

    Regarding the example covariance matrix, is the following correct: the symmetry between the upper right and lower left triangles reflects the fact that $\text{cov}(X_i,X_j)=\text{cov}(X_j,X_i)$, but the symmetry between the upper left and the lower right (in this case that $\text{cov}(X_1, X_2) = \text{cov}(X_2,X_3) = 0.3$ is just part of the example, but could be replaced with two different numbers that sum to $0.6$ e.g., $\text{cov}(X_1, X_2) = a$ and $\text{cov}(X_2,X,3) = 0.6 -a$? Thanks again.

  • $$\text{Var}\bigg(\sum_{i=1}^m X_i\bigg) = \sum_{i=1}^m \text{Var}(X_i) + 2\sum_{i\lt j} \text{Cov}(X_i,X_j).$$

    So, if the covariances average to $0$, which would be a consequence if the variables are pairwise uncorrelated or if they are independent, then the variance of the sum is the sum of the variances.

    An example where this is not true: Let $\text{Var}(X_1)=1$. Let $X_2 = X_1$. Then $\text{Var}(X_1 + X_2) = \text{Var}(2X_1)=4$.

    It will rarely be true for sample variances.

    @DWin, "rare" is an understatement - if the $X$s have a continuous distribution, the probability that the sample variance of the sum is equal to the sum of the sample variances in exactly 0 :)

    @Douglas Zare Do you know any way to calculate this non-manually? Say I am summing quantities measured on 30 separate days and they are likely correlated in some ways even though they look somewhat stochastic. How do I go about calculating the summed quantity's uncertainty? Or should I just assume the worst case uncertainty scenario by summing the individual estimation uncertainty linearly?

  • I just wanted to add a more succinct version of the proof given by Macro, so it's easier to see what's going on. $\newcommand{\Cov}{\text{Cov}}\newcommand{\Var}{\text{Var}}$

    Notice that since $\Var(X) = \Cov(X,X)$

    For any two random variables $X,Y$ we have:

    \begin{align} \Var(X+Y) &= \Cov(X+Y,X+Y) \\ &= E((X+Y)^2)-E(X+Y)E(X+Y) \\ &\text{by expanding,} \\ &= E(X^2) - (E(X))^2 + E(Y^2) - (E(Y))^2 + 2(E(XY) - E(X)E(Y)) \\ &= \Var(X) + \Var(Y) + 2(E(XY)) - E(X)E(Y)) \\ \end{align} Therefore in general, the variance of the sum of two random variables is not the sum of the variances. However, if $X,Y$ are independent, then $E(XY) = E(X)E(Y)$, and we have $\Var(X+Y) = \Var(X) + \Var(Y)$.

    Notice that we can produce the result for the sum of $n$ random variables by a simple induction.

  • Yes, if each pair of the $X_i$'s are uncorrelated, this is true.

    See the explanation on Wikipedia

    I agree. You also find a simple(r) explanation on Insight Things.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM