### How do I calculate a weighted standard deviation? In Excel?

• So, I have a data set of percentages like so:

100   /   10000   = 1% (0.01)
2     /     5     = 40% (0.4)
4     /     3     = 133% (1.3)
1000  /   2000    = 50% (0.5)


I want to find the standard deviation of the percentages, but weighted for their data volume. ie, the first and last data points should dominate the calculation.

How do I do that? And is there a simple way to do it in Excel?

The formula with (M-1)/M is correct. If you have a doubt, check it by setting all the weights equal to 1, and you will obtain classical formula for unbiased estimate for the standard deviation with (N-1) in the denominator. To whuber: unusual does not mean incorrect.

The formula with (M-1)/M is NOT CORRECT. Imagine you add a million points with weights of one trillionth. You don't change your answer at all regardless of what those weights are, but your $(M-1)/M$ term becomes 1? Absolutely NOT! If you care that $(M-1)/M \neq 1$, then you also care that this is just wrong.

I wonder *why* you want the standard deviation here? you only have $4$ numbers! How is that too many numbers? Especially when percentages are more easily explained and understood.

@probabilityislogic it was a simplified example for keeping the question short.

10 years ago

$$\sqrt{ \frac{ \sum_{i=1}^N w_i (x_i - \bar{x}^*)^2 }{ \frac{(M-1)}{M} \sum_{i=1}^N w_i } },$$

where

$N$ is the number of observations.

$M$ is the number of nonzero weights.

$w_i$ are the weights

$x_i$ are the observations.

$\bar{x}^*$ is the weighted mean.

Remember that the formula for weighted mean is:

$$\bar{x}^* = \frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}.$$

Use the appropriate weights to get the desired result. In your case I would suggest to use $\frac{\mbox{Number of cases in segment}}{\mbox{Total number of cases}}$.

To do this in Excel, you need to calculate the weighted mean first. Then calculate the $(x_i - \bar{x}^*)^2$ in a separate column. The rest must be very easy.

Thanks for your post. But I think there may be a mistake in the weighted mean expression. Shouldn't it be divided with the sum of the weights ?

@Gilles, you're right. deps_stats, the fraction $(M-1)/M$ in the SD is unusual. Do you have a citation for this formula or can you at least explain the reason for including that term?

the sum of all weights is BY DEFINITION = 1, so what is the point of including that in the divisor term?

@Aaron Weights are not always defined to sum to unity, as exemplified by the weights given in this question!

(-1) I am downvoting this answer because no justification or reference for the $(M-1)/M$ term has been provided (and I'm pretty sure it does *not* make the estimate of the variance unbiased, which would be its apparent motivation).

In light of the added reference (which is not authoritative, but it is a reference) I am removing the downvote. I am not upvoting this answer, though, because calculations show the proposed weighting does *not* produce an unbiased estimate of anything at all (except when all weights equal $1$). The real difficulty here--which is the fault of the question, not the answer--is that it's not clear what this "weighted standard deviation" is attempting to estimate. Without a definite estimand, there is no justification to introduce an $(M-1)/M$ factor to "reduce bias" (or for any other reason).

The formula with (M-1)/M is correct. If you have a doubt, check it by setting all the weights equal to 1, and you will obtain classical formula for unbiased estimate for the standard deviation with (N-1) in the denominator. To whuber: unusual does not mean incorrect.

@Mikhail You are correct that "unusual" and "right" have little to do with one another. However, unusual results do implicitly demand a little more justification because being unusual is one indicator that an error may have been made. Your argument is invalid: although the formula indeed reduces to one for an unbiased estimator when all weights are equal, that does not imply the estimator remains unbiased when unequal weights are used. I am not asserting your conclusion is wrong, but only that so far no valid justification has been offered.

I'm sorry to ask, but this isn't clear to me. The comments seem to indicate this answer introduces a bias, but I'm not seeing a complete solution with the unbiased version. I do see the (N - 1), is that used instead of (M - 1)/M? That remark says standard deviation, not weighted standard deviation. I'm not trying to split hairs, just want to make sure I'm understanding things properly. Would someone that feels this answer is introducing bias please submit a new answer containing their proposed modification, justification, and references?

@whuber - This is wrong, as you suspected. It's correct if the weights themselves are frequencies. But although frequencies go into computing the percentages _in this case the weights, though unspecified, are not frequencies of occurrence but something else to do with "data volume"_. So this is the wrong answer.

How do you go about computing the weights for the measurements? If i'm reading this right, they just exist with no source?