Why does increasing the sample size lower the (sampling) variance?
I'm trying to understand how increasing the sample size increases the power of an experiment. My lecturer's slides explain this with a picture of 2 normal distributions, one for the null-hypothesis and one for the alternative-hypothesis and a decision threshold c between them. They argue that increasing sample size will lower variance and thereby cause a higher kurtosis, reducing the shared area under the curves and so the probability of a type II error.
I don't understand how a bigger sample size will lower the variance.
I assume you just calculate the sample variance and use it as a parameter in a normal distribution.
- googling, but most accepted answers have 0 upvotes or are merely examples
- thinking: By the law of big numbers every value should eventually stabilize around its probable value according to the normal distribution we assume. And the variance should therefore converge to the variance of our assumed normal distribution. But what is the variance of that normal distribution and is it a minimum value i.e. can we be sure our sample variance decreases to that value?
Your thought experiment concerned normally distributed data but it also applies to data drawn from many other distributions (as noted by @Aksakal, not all! The Cauchy is a commonly cited example of such bad behaviour). For binomial data there is good discussion of how power and standard error vary with sample size at http://stats.stackexchange.com/q/87730/22228
As you are new to CrossValidated, allow me to point out that if you received what you consider a satisfactory answer, you should consider marking it as "accepted" by clicking a green tick to the left of it. This provides additional reputation for the answerer and also marks the question as resolved.
I think about it like this: each new point has unique information. Infinite points have enough to make a perfect estimate. As we add more and more new sample points, the difference between the information we need to have a perfect estimate and the information we actually have gets smaller and smaller.
This is the source of the confusion: is not the sample variance that decreases, but the variance of the sample variance. The sample variance is an estimator (hence a random variable). If your data comes from a normal N(0, 5), the sample variance will be close to 5. How close? Depends on the variance of your estimator for the sample variance. With 100 data points, you may find something like 4.92. With 1000, you'll find something like 4.98. WIth 10000, you'll find 5.0001. So is the accuracy of your measurements that increases, not your measurements themselves.
Standard deviations of averages are smaller than standard deviations of individual observations. [Here I will assume independent identically distributed observations with finite population variance; something similar can be said if you relax the first two conditions.]
It's a consequence of the simple fact that the standard deviation of the sum of two random variables is smaller than the sum of the standard deviations (it can only be equal when the two variables are perfectly correlated).
In fact, when you're dealing with uncorrelated random variables, we can say something more specific: the variance of a sum of variates is the sum of their variances.
This means that with $n$ independent (or even just uncorrelated) variates with the same distribution, the variance of the mean is the variance of an individual divided by the sample size.
Correspondingly with $n$ independent (or even just uncorrelated) variates with the same distribution, the standard deviation of their mean is the standard deviation of an individual divided by the square root of the sample size:
So as you add more data, you get increasingly precise estimates of group means. A similar effect applies in regression problems.
Since we can get more precise estimates of averages by increasing the sample size, we are more easily able to tell apart means which are close together -- even though the distributions overlap quite a bit, by taking a large sample size we can still estimate their population means accurately enough to tell that they're not the same.
The variability that's shrinking when N increases is the variability of the sample mean, often expressed as standard error. Or, in other terms, the certainty of the veracity of the sample mean is increasing.
Imagine you run an experiment where you collect 3 men and 3 women and measure their heights. How certain are you that the mean heights of each group are the true mean of the separate populations of men and women? I should think that you wouldn't be very certain at all. You could easily collect new samples of 3 and find new means several inches from the first ones. Quite a few of the repeated experiments like this might even result in women being pronounced taller than men because the means would vary so much. With a low N you don't have much certainty in the mean from the sample and it varies a lot across samples.
Now imagine 10,000 observations in each group. It's going to be pretty hard to find new samples of 10,000 that have means that differ much from each other. They will be far less variable and you'll be more certain of their accuracy.
If you can accept this line of thinking then we can insert it into the calculations of your statistics as standard error. As you can see from it's equation, it's an estimation of a parameter, $\sigma$ (that should become more accurate as n increases) divided by a value that always increases with n, $\sqrt n$. That standard error is representing the variability of the means or effects in your calculations. The smaller it is, the more powerful your statistical test.
Here's a little simulation in R to demonstrate the relation between a standard error and the standard deviation of the means of many many replications of the initial experiment. In this case we'll start with a population mean of 100 and standard deviation of 15.
mu <- 100 s <- 50 n <- 5 nsim <- 10000 # number of simulations # theoretical standard error s / sqrt(n) # simulation of experiment and the standard deviations of their means y <- replicate( nsim, mean( rnorm(n, mu, s) ) ) sd(y)
Note how the final standard deviation is close to the theoretical standard error. By playing with the n variable here you can see the variability measure will get smaller as n increases.
[As an aside, kurtosis in the graphs isn't really changing (assuming they are normal distributions). Lowering the variance doesn't change the kurtosis but the distribution will look narrower. The only way to visually examine the kurtosis changes is put the distributions on the same scale.]
Two things are not entirely clear: (1) Are the bell curves that OP talks about distributions of sample means? (2) Are the sample sizes considered for both the distribution of the mean of the control group samples and the distribution of the mean of the experimental group samples?
If you wanted to know what is the average weight of american citizens, then in the ideal case you'd immediately ask every citizen to step on the scales, and collect the data. You'd get an exact answer. This is very difficult, so maybe you could get a few citizens to step on scale, compute the average and get an idea of what is the average of the population. Would you expect that the sample average be exactly equal to the population average? I hope not.
Now, would you agree that if you got more and more people, at some point we'd be getting closer to population mean? We should, right? In the end the most people we can get is entire population, and its mean is what we're looking for. This is the intuition.
This was an idealized thought experiment. In reality, there are complications. I'll give you two.
- Imagine that the data is coming from a Cauchy distribution. You can increase your sample infinitely, yet the variance will not decrease. This distribution has no population variance. In fact, strictly speaking, it has no sample mean either. It's sad. Amazingly, this distribution is quite real, it pops up here and there in physics.
- Imagine that you decided to go on with a task of determining the average weight of american citizens. So, you take your scale and go from home to home. This will take you many many years. By the time you collect million observations, some of the citizens in your data set will have changed their weight a lot, some had died etc. The point is that increasing sample size in this case doesn't help you.
I believe that the Law of Large Numbers explains why the variance (standard error) goes down when the sample size increases. Wikipedia's article on this says:
According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
In terms of the Central Limit Theorem:
When drawing a single random sample, the larger the sample is the closer the sample mean will be to the population mean (in the above quote, think of "number of trials" as "sample size", so each "trial" is an observation). Therefore, when drawing an infinite number of random samples, the variance of the sampling distribution will be lower the larger the size of each sample is.
In other words, the bell shape will be narrower when each sample is large instead of small, because in that way each sample mean will be closer to the center of the bell.