How to 'sum' a standard deviation?

  • I have a monthly average for a value and a standard deviation corresponding to that average. I am now computing the annual average as the sum of monthly averages, how can I represent the standard deviation for the summed average ?

    For example considering output from a wind farm:

    Month        MWh     StdDev
    January      927     333 
    February     1234    250
    March        1032    301
    April        876     204
    May          865     165
    June         750     263
    July         780     280
    August       690     98
    September    730     76
    October      821     240
    November     803     178
    December     850     250
    

    We can say that in the average year the wind farm produces 10,358 MWh, but what is the standard deviation corresponding to this figure ?

    A discussion following a now-deleted reply noted a **possible ambiguity** in this question: do you seek the SD of the monthly averages or do you want to recover the SD of all the original values from which those averages were constructed? That reply also correctly pointed out that if you want the latter, you will need the numbers of values involved in each one of the monthly averages.

    A comment to another deleted reply pointed out that it is strange to compute an average as a *sum*: surely you mean that you are *averaging* the monthly averages. But if what you want is to estimate the average of all the original data, then such a procedure is not usually a good one: a *weighted* average is needed. And of course it's not possible to give a good answer to your question about the "SD for the summed average" until it is clear what the "summed average" is and what it is intended to represent. Please clarify that for us.

    @whuber I have added an example to clarify. Mathematically I believe that the sum of averages is equal to the monthly average times 12.

    OK, so to make this a bit more abstract: you have twelve variables $\bar{X}_1,\ldots,\bar{X}_{12}$, each of which corresponds to the average production during different months. They have been computing using data from different years. Right? Are the standard deviations known (e.g. because they are computed from a theoretical model) or estimated using the same data that were used for the averages?

    Also, do you have the original data with, say, daily observations?

    @MånsT I have the original data - 10 minute observations. The reason I am working with monthly averages in order to maintain responsiveness of software. The standard deviation is for the moment calculated using MySQL STDDEV function, eventually it will be calculated directly in Java.

    @whuber Interesting point in requiring the number of observations as a parameter for the formula. Is it reasonable to want to at least read (erroneous) replies before their deletion ? (A discussion for meta perhaps)

    Yes, klonq, that is a very reasonable request. However, these replies were deleted by their owner, not by the community. To preserve their value, I have attempted here to relay (my take on) the key ideas arising in those replies and their comments. BTW, your recent edits are quite helpful: people like to see example data.

    Surely averaging the variance and thus calculating the average standard deviation can't be the whole answer! All this represents is the average variance in measuring power output WITHIN a single month. This is a good start at getting an accurate guage on measurement error but doesn't this standard deviation of 232 need to be combined in some way with the INTER-MONTHLY variation in power output. i.e. I think that the end resulting standard deviation for the Grand Mean should be a little higher than 232 if you account for the combined error in measurement of both within each month as well as BET

    For example maybe a proposed answer might be:

    that the standard deviation for the intermonthly variation would be 148 using the 12 monthly averages and then this could be square-root summed to the standard deviation of 232 for within months? MAYBE - i don't know.

    Welcome to the site, @Hayden. This isn't an answer to the OP's question. Please only use the "Your Answer" field to provide answers. If you have a follow-up question, click the `ASK QUESTION, which contains information for new users.

  • Ian Boyd

    Ian Boyd Correct answer

    9 years ago

    Short answer: You average the variances; then you can take square root to get the average standard deviation.


    Example

    Month          MWh  StdDev  Variance
    ==========   =====  ======  ========
    January        927    333     110889
    February      1234    250      62500
    March         1032    301      90601
    April          876    204      41616
    May            865    165      27225
    June           750    263      69169
    July           780    280      78400
    August         690     98       9604
    September      730     76       5776
    October        821    240      57600
    November       803    178      31684
    December       850    250      62500
    ===========  =====  =======  =======
    Total        10358            647564
    ÷12            863    232      53964
    

    And then the average standard deviation is sqrt(53,964) = 232


    From Sum of normally distributed random variables:

    If $X$ and $Y$ are independent random variables that are normally distributed (and therefore also jointly so), then their sum is also normally distributed

    ...the sum of two independent normally distributed random variables is normal, with its mean being the sum of the two means, and its variance being the sum of the two variances

    And from Wolfram Alpha's Normal Sum Distribution:

    Amazingly, the distribution of a sum of two normally distributed independent variates $X$ and $Y$ with means and variances $(\mu_X,\sigma_X^2)$ and $(\mu_Y,\sigma_Y^2)$, respectively is another normal distribution

    $$ P_{X+Y}(u) = \frac{1}{\sqrt{2\pi (\sigma_X^2 + \sigma_Y^2)}} e^{-[u-(\mu_X+\mu_Y)]^2/[2(\sigma_X^2 + \sigma_Y^2)]} $$

    which has mean

    $$\mu_{X+Y} = \mu_X+\mu_Y$$

    and variance

    $$ \sigma_{X+Y}^2 = \sigma_X^2 + \sigma_Y^2$$

    For your data:

    • sum: 10,358 MWh
    • variance: 647,564
    • standard deviation: 804.71 ( sqrt(647564) )

    enter image description here

    So to answer your question:

    • How to 'sum' a standard deviation?
    • You sum them quadratically:

      s = sqrt(s1^2 + s2^2 + ... + s12^2)
      

    Conceptually you sum the variances, then take the square root to get the standard deviation.


    Because i was curious, i wanted to know the average monthly mean power, and its standard deviation. Through induction, we need 12 normal distributions which:

    • sum to a mean of 10,358
    • sum to a variance of 647,564

    That would be 12 average monthly distributions of:

    • mean of 10,358/12 = 863.16
    • variance of 647,564/12 = 53,963.6
    • standard deviation of sqrt(53963.6) = 232.3

    enter image description here

    We can check our monthly average distributions by adding them up 12 times, to see that they equal the yearly distribution:

    • Mean: 863.16*12 = 10358 = 10,358 (correct)
    • Variance: 53963.6*12 = 647564 = 647,564 (correct)

    Note: i'll leave it to someone with a knowledge of the esoteric Latex math to convert my formula images, and formula code into stackexchange formatted formulas.

    Edit: I moved the short, to the point, answer up top. Because i needed to do this again today, but wanted to double-check that i average the variances.

    This all seems to assume the months are uncorrelated - have you made that assumption explicit anywhere? Also, why do we need to bring in the normal distribution? If we're only talking about variance then that seems unnecessary - for example, see my answer here

    @Marco Because i think better in pictures and it makes everything easier to understand.

    @Marco Also, i believe this question started on the (now defunct) stats.stackexchange site. A wall of formulas are less accessible than simpler, graphical, less rigorous treatments.

    I doubt this is correct. Imagine two data sets with each only a single measurement each. Their variance of each set is 0, but the set of both measurements has a variance greater than 0 if the data points differ.

    @Njol, I think that's why we assume all variables have normal distribution. And we can do it here, because we talk about phisical measurement. In your example both variables are not normally distributed.

    @Njol, You are right. Take a look at my answer. The variance over a set when you have the mean and variance over each subset is composed of two parts: 1- Average of the variance for each sub set , 2- Variance of mean of each sub set. For your marginal case when each sub set has only one member the variance of each subset is zero, but the total variance can be obtained by calculating variance of mean of each subset (the one member)

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM