### How to 'sum' a standard deviation?

I have a monthly average for a value and a standard deviation corresponding to that average. I am now computing the annual average as the sum of monthly averages, how can I represent the standard deviation for the summed average ?

For example considering output from a wind farm:

`Month MWh StdDev January 927 333 February 1234 250 March 1032 301 April 876 204 May 865 165 June 750 263 July 780 280 August 690 98 September 730 76 October 821 240 November 803 178 December 850 250`

We can say that in the average year the wind farm produces 10,358 MWh, but what is the standard deviation corresponding to this figure ?

A comment to another deleted reply pointed out that it is strange to compute an average as a *sum*: surely you mean that you are *averaging* the monthly averages. But if what you want is to estimate the average of all the original data, then such a procedure is not usually a good one: a *weighted* average is needed. And of course it's not possible to give a good answer to your question about the "SD for the summed average" until it is clear what the "summed average" is and what it is intended to represent. Please clarify that for us.

@whuber I have added an example to clarify. Mathematically I believe that the sum of averages is equal to the monthly average times 12.

OK, so to make this a bit more abstract: you have twelve variables $\bar{X}_1,\ldots,\bar{X}_{12}$, each of which corresponds to the average production during different months. They have been computing using data from different years. Right? Are the standard deviations known (e.g. because they are computed from a theoretical model) or estimated using the same data that were used for the averages?

Also, do you have the original data with, say, daily observations?

@MånsT I have the original data - 10 minute observations. The reason I am working with monthly averages in order to maintain responsiveness of software. The standard deviation is for the moment calculated using MySQL STDDEV function, eventually it will be calculated directly in Java.

@whuber Interesting point in requiring the number of observations as a parameter for the formula. Is it reasonable to want to at least read (erroneous) replies before their deletion ? (A discussion for meta perhaps)

Yes, klonq, that is a very reasonable request. However, these replies were deleted by their owner, not by the community. To preserve their value, I have attempted here to relay (my take on) the key ideas arising in those replies and their comments. BTW, your recent edits are quite helpful: people like to see example data.

Surely averaging the variance and thus calculating the average standard deviation can't be the whole answer! All this represents is the average variance in measuring power output WITHIN a single month. This is a good start at getting an accurate guage on measurement error but doesn't this standard deviation of 232 need to be combined in some way with the INTER-MONTHLY variation in power output. i.e. I think that the end resulting standard deviation for the Grand Mean should be a little higher than 232 if you account for the combined error in measurement of both within each month as well as BET

For example maybe a proposed answer might be:

that the standard deviation for the intermonthly variation would be 148 using the 12 monthly averages and then this could be square-root summed to the standard deviation of 232 for within months? MAYBE - i don't know.

Welcome to the site, @Hayden. This isn't an answer to the OP's question. Please only use the "Your Answer" field to provide answers. If you have a follow-up question, click the `ASK QUESTION, which contains information for new users.

Ian Boyd Correct answer

9 years agoShort answer: You average the

; then you can take square root to get the average*variances*.*standard deviation*Example

`Month MWh StdDev Variance ========== ===== ====== ======== January 927 333 110889 February 1234 250 62500 March 1032 301 90601 April 876 204 41616 May 865 165 27225 June 750 263 69169 July 780 280 78400 August 690 98 9604 September 730 76 5776 October 821 240 57600 November 803 178 31684 December 850 250 62500 =========== ===== ======= ======= Total 10358 647564 ÷12 863 232 53964`

And then the average

*standard deviation*is`sqrt(53,964) = 232`

From Sum of normally distributed random variables:

If $X$ and $Y$ are independent random variables that are normally distributed (and therefore also jointly so), then their sum is also normally distributed

...the sum of two independent normally distributed random variables is normal, with its mean being the sum of the two means, and its variance being the sum of the two variances

And from Wolfram Alpha's Normal Sum Distribution:

Amazingly, the distribution of a sum of two normally distributed independent variates $X$ and $Y$ with means and variances $(\mu_X,\sigma_X^2)$ and $(\mu_Y,\sigma_Y^2)$, respectively is another normal distribution

$$ P_{X+Y}(u) = \frac{1}{\sqrt{2\pi (\sigma_X^2 + \sigma_Y^2)}} e^{-[u-(\mu_X+\mu_Y)]^2/[2(\sigma_X^2 + \sigma_Y^2)]} $$

which has mean

$$\mu_{X+Y} = \mu_X+\mu_Y$$

and variance

$$ \sigma_{X+Y}^2 = \sigma_X^2 + \sigma_Y^2$$

For your data:

- sum:
`10,358 MWh`

- variance:
`647,564`

- standard deviation:
`804.71 ( sqrt(647564) )`

So to answer your question:

**How to 'sum' a standard deviation**?You sum them quadratically:

`s = sqrt(s1^2 + s2^2 + ... + s12^2)`

Conceptually you sum the variances, then take the square root to get the standard deviation.

Because i was curious, i wanted to know the average monthly

**mean**power, and its*standard deviation*. Through induction, we need 12 normal distributions which:- sum to a mean of
`10,358`

- sum to a variance of
`647,564`

That would be 12 average monthly distributions of:

- mean of
`10,358/12 = 863.16`

- variance of
`647,564/12 = 53,963.6`

- standard deviation of
`sqrt(53963.6) = 232.3`

We can check our monthly average distributions by adding them up 12 times, to see that they equal the yearly distribution:

- Mean:
`863.16*12 = 10358 = 10,358`

(*correct*) - Variance:
`53963.6*12 = 647564 = 647,564`

(*correct*)

**Note**: i'll leave it to someone with a knowledge of the esoteric Latex math to convert my formula images, and`formula code`

into stackexchange formatted formulas.**Edit**: I moved the short, to the point, answer up top. Because i needed to do this again today, but wanted to double-check that i*average*the*variances*.This all seems to assume the months are uncorrelated - have you made that assumption explicit anywhere? Also, why do we need to bring in the normal distribution? If we're only talking about variance then that seems unnecessary - for example, see my answer here

@Marco Because i think better in pictures and it makes everything easier to understand.

@Marco Also, i believe this question started on the (now defunct) stats.stackexchange site. A wall of formulas are less accessible than simpler, graphical, less rigorous treatments.

I doubt this is correct. Imagine two data sets with each only a single measurement each. Their variance of each set is 0, but the set of both measurements has a variance greater than 0 if the data points differ.

@Njol, I think that's why we assume all variables have normal distribution. And we can do it here, because we talk about phisical measurement. In your example both variables are not normally distributed.

@Njol, You are right. Take a look at my answer. The variance over a set when you have the mean and variance over each subset is composed of two parts: 1- Average of the variance for each sub set , 2- Variance of mean of each sub set. For your marginal case when each sub set has only one member the variance of each subset is zero, but the total variance can be obtained by calculating variance of mean of each subset (the one member)

- sum:

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM

whuber 9 years ago

A discussion following a now-deleted reply noted a **possible ambiguity** in this question: do you seek the SD of the monthly averages or do you want to recover the SD of all the original values from which those averages were constructed? That reply also correctly pointed out that if you want the latter, you will need the numbers of values involved in each one of the monthly averages.