Narrow confidence interval -- higher accuracy?
I have two questions about confidence intervals:
Apparently a narrow confidence interval implies that there is a smaller chance of obtaining an observation within that interval, therefore, our accuracy is higher.
Also a 95% confidence interval is narrower than a 99% confidence interval which is wider.
The 99% confidence interval is more accurate than the 95%.
Can someone give a simple explanation that could help me understand this difference between accuracy and narrowness?
I think you mean "there is a smaller chance of obtaining an observation *outside* that interval". Unfortunately, a Confidence Interval may not mean what it appears to mean, due to technical, statistical issues, but in general the narrower the interval (at a given confidence level) the less uncertainty there is about the results. There are many threads on this site discussing what a Confidence Interval means (as opposed to, say, a Credible Interval). We're not even getting into Predictive Intervals...
@Wayne Why is not the statement be "there is a smaller chance of obtaining an observation *within* that interval" ? Since narrow interval has a large type 1 error , it is more likely to reject the *true* null hypothesis , that is , my true null value is not contained in that interval . So , it seems to me `a narrow confidence interval implies that there is a smaller chance of obtaining an observation within that interval` is correct . Would you please explain me where am I doing the mistake ?
The 95% is not numerically attached at all to how confident you are that you've covered the true effect in your experiment. Perhaps recognizing that "interval using 95% coverage range calculation" might be a more accurate name for it. You can make the choice to decide that the interval contains the true value; and you'll be right if you do that consistently 95% of the time. But you really don't know how likely it is for your particular experiment without more information.
Q1: Your first query conflates two things and misuses a term. No wonder you're confused. A narrower confidence interval may be more precise but, when calculated the same way, such as the 95% method, they all have the same accuracy. They capture the true value the same proportion of the time.
Also, just because it's narrow doesn't mean you're less likely to encounter a sample that falls within that narrow confidence interval. A narrow confidence interval can be achieved one of three ways. The experimental method or nature of the data could just have very low variance. The confidence interval around the boiling point of tap water at sea level is pretty small, regardless of the sample size. The confidence interval around the average weight of people might be rather large because people are very variable but one can make that confidence interval smaller by just acquiring more observations. In that case, as you gain more certainty about where you believe the true value is, by collecting more samples and making a narrower confidence interval, then the probability of encountering an individual in that confidence interval does go down. (it goes down in any case when you increase sample size, but you may not bother collecting the big sample in the boiling water case). Finally, it could be narrow because your sample is unrepresentative. In that case you are actually more likely to have one of the 5% of intervals that does not contain the true value. It's a bit of a paradox regarding CI width and something you should check by knowing the literature and how variable this data typically is.
Further consider that the confidence interval is about trying to estimate the true mean value of the population. If you knew that spot on then you'd be even more precise (and accurate) and not even have a range of estimates. But your probability of encountering an observation with that exact same value would be far lower than finding one within any particular sample based CI.
Q2: A 99% confidence interval is wider than a 95%. Therefore, it's more likely that it will contain the true value. See the distinction above between precise and accurate, you're conflating the two. If I make a confidence interval narrower with lower variability and higher sample size it becomes more precise, the likely values cover a smaller range. If I increase the coverage by using a 99% calculation it becomes more accurate, the true value is more likely to be within the range.
For a given dataset, increasing the confidence level of a confidence interval will only result in larger intervals (or at least not smaller). That's not about accuracy or precision but rather about how much risk you're willing to take about missing the true value.
If you're comparing confidence intervals for the same sort of parameter from multiple data sets and one is smaller than the other, you could say that the smaller one is more precise. I prefer to talk about precision rather than accuracy in this situation (see this relevant Wikipedia article).
What is meant by "same sort of parameter" and "multiple data sets" ? Say , a survey on illiteracy and the survey is carried out in different time , 1995, 1998 , etc . Then is the "illiteracy rate" same sort of parameter and do the data sets of 1995, 1998 , etc indicate multiple data sets ?
First of all, a CI for a given confidence percentage (e.g.95%) means, for all practical purposes (though technically it is not correct) that you are confident that the true value is in the interval.
If this is interval is "narrow" (note that this can only be regarded in a relative fashion, so, for comparison with what follows, say it is 1 unit wide), it means that there is not much room to play: whichever value you pick in that interval is going to be close to the true value (because the interval is narrow), and you are quite certain of that (95%).
Compare this to a relatively wide 95% CI (to match the example before, say it is 100 units wide): here, you are still 95% certain that the true value will be within this interval, yet that doesn't tell you very much, since there are relatively many values in the interval (about a factor 100 as opposed to 1 - and I ask, again, of purists to ignore the simplification).
Typically, you are going to need a bigger interval when you want to be 99% certain that the true value is in it, than when you only need to be 95% certain (note: this may not be true if the intervals are not nested), so indeed, the more confidence you need, the broader the interval you will need to pick.
On the other hand, you are more certain with the higher confidence interval. So, If I give you 2 intervals of the same width, and I say one is a 95% CI and the other is a 99% CI, I hope you will prefer the 99% one. In this sense, 99% CIs are more accurate: you have less doubt that you will have missed the truth.
thanks! so then when they say that this new research on neutrinos being faster than light has a very small confidence interval (I guess this means narrow) then that means that they are more likely to be accurate then if it was a wide confidence interval? (disregarding all other aspects)
Nick, your first statement is wrong. It's not a "technical issue", it's just not correct. The confidence interval is a statement about what would happen in repeated experiments, that they would cover the true value 95% of the time. A statement about the confidence that the true value is within my given range found in my given experiment is not the same as that at all. If you removed the "that" in "that confident" and the parenthetical numerical amount then you'd be closer to the truth. You could just say that it means you believe the true value likely to fall in the interval.
@John: I specifically avoided saying that the interval itself is the random variable, though my sentence does not imply it not to be (admittedly, it does suggest so). I know the issues involved, but found them irrelevant for the question. I have never seen a _practical_ situation where the difference mattered either, hence the "for all practical purposes".
Haven't encountered the issue? That's like saying the p-value = the probability of the null and then saying that you've never encountered an issue with it. You won't if you stay in the right journals. It's just incorrect to say that you're 95% certain the the true value is in your current range. Treating it as some esoteric matter just means now we'll have (at least) one more person walking around saying, "I'm 95% confident the value is in this range." It would hardly change your answer to correct it. The other issues you skirt could be ignored if you changed that one statement.
I am adding to some good answers here that I gave upvotes to. I think there is a little more that should be said to completely clear up the conclusion. I like the terms accurate and correct as Efron defines them. I gave a lengthy discussion on this very recently on a different question. Moderate whuber really liked that answer. I will not go to the same lnegth to repeat that here. However to Efron accuracy relates to the confidence level and correctness to the width or tightness of the interval. But you can't talk about tightness without considering accuracy first. Some confidence intervals are exact those are accurate because they have the actual coverage that they advertise. A 95% confidence interval can also be approximate because it uses an asymptotic distribution. Approximate intervals based on asymptotics are for a finite sample size n not going to have the advertised coverage which is the coverage you would get if the asymptotic distributionwere the exact distribution. So an approximate interval could undercover (i.e. advertise 95% when its actual coverage is only 91%) or in the rare but less serious case overcover (i.e. advertised coverage is 95% but actual in 98%). In the former case we worry about how close the actual coverage is to the advertised coverage). A measure of closeness is the order of accuracy which could be say 1/√n or 1/n. If the actual confidence level is close we call it accurate. Accuray is important with bootstrap confidence intervals which are never exact but some variants are more accurate than others.
This definition of accuracy may be different to the one the OP is referring to but it should be clear now what Efron's definition is and why it is important to be accurate. Now if you have two methods that are exact we can prefer one over the other if for any confidence level it has the smaller expected width. A confidence interval that is best in this sense (sometime called shortest) would be the one to choose. But this required exactness. If the confidence level is only approximate we could be comparing apples and oranges. One could be narrower than another only because it is less accurate and hence has a lower actual coverage than its advertised coverage.
If two confidence intervals are both very accurate or one is exact and the other very accurate comparing expected width may be okay because at least now we are looking at just two two varieties of apples.