What is the difference between a consistent estimator and an unbiased estimator?

  • What is the difference between a consistent estimator and an unbiased estimator?

    The precise technical definitions of these terms are fairly complicated, and it's difficult to get an intuitive feel for what they mean. I can imagine a good estimator, and a bad estimator, but I'm having trouble seeing how any estimator could satisfy one condition and not the other.

    Have you looked at the very first figure in the Wikipedia article on consistent estimators, which specifically explains this distinction?

    I've read the articles for both consistency and bias, but I still don't really understand the distinction. (The figure you refer to claims that the estimator is consistent but biased, but doesn't explain _why_.)

    Which part of the explanation do you need help with? The caption points out that each of the estimators in the sequence is biased and it also explains why the sequence is consistent. Do you need an explanation of how the bias in these estimators is apparent from the figure?

    +1 The comment thread following one of these answers is very illuminating, both for what it reveals about the subject matter and as an interesting example of how an online community can work to expose and rectify misconceptions.

  • Macro

    Macro Correct answer

    8 years ago

    To define the two terms without using too much technical language:

    • An estimator is consistent if, as the sample size increases, the estimates (produced by the estimator) "converge" to the true value of the parameter being estimated. To be slightly more precise - consistency means that, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.

    • An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value.

    • The two are not equivalent: Unbiasedness is a statement about the expected value of the sampling distribution of the estimator. Consistency is a statement about "where the sampling distribution of the estimator is going" as the sample size increases.

    It certainly is possible for one condition to be satisfied but not the other - I will give two examples. For both examples consider a sample $X_1, ..., X_n$ from a $N(\mu, \sigma^2)$ population.

    • Unbiased but not consistent: Suppose you're estimating $\mu$. Then $X_1$ is an unbiased estimator of $\mu$ since $E(X_1) = \mu$. But, $X_1$ is not consistent since its distribution does not become more concentrated around $\mu$ as the sample size increases - it's always $N(\mu, \sigma^2)$!

    • Consistent but not unbiased: Suppose you're estimating $\sigma^2$. The maximum likelihood estimator is $$ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (X_i - \overline{X})^2 $$ where $\overline{X}$ is the sample mean. It is a fact that $$ E(\hat{\sigma}^2) = \frac{n-1}{n} \sigma^2 $$ herefore, $\hat{\sigma}^2$ which can be derived using the information here. Therefore $\hat{\sigma}^2$ is biased for any finite sample size. We can also easily derive that $${\rm var}(\hat{\sigma}^2) = \frac{ 2\sigma^4(n-1)}{n^2}$$ From these facts we can informally see that the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases since the mean is converging to $\sigma^2$ and the variance is converging to $0$. (Note: This does constitute a proof of consistency, using the same argument as the one used in the answer here)

    (+1) Not all MLEs are consistent though: the general result is that there exists a consistent subsequence in the sequence of MLEs. For proper consistency a few additional requirements, e.g. identifiability, are needed. Examples of MLEs that aren't consistent are found in certain errors-in-variables models (where the "maximum" turns out to be a saddle-point).

    Well, the EIV MLEs that I mentioned are perhaps not good examples, since the likelihood function is unbounded and no maximum exists. They're good examples of how the ML approach can fail though :) I'm sorry that I can't give a relevant link right now - I'm on vacation.

    Thank you @MånsT. The necessary conditions were outlined in the link but that wasn't clear from the wording.

    Just a side note: The parameter space is certainly not compact in this case, in contrast to the conditions at that link, nor is the log likelihood concave wrt $\sigma^2$ itself. The stated consistency result still holds, of course.

    You're right, @cardinal, I'll delete that reference. It's clear enough that $E(\hat{\sigma}^2) \rightarrow \sigma^2$ and ${\rm var}(\hat{\sigma}^2) \rightarrow 0$ but I don't want to stray from the point by turning this into an exercise of proving the consistency of $\hat{\sigma}^2$.

    What is the consequence of inconsistency to OLS Assumptions or to BLUE?

    Hi, what mean var(σ^2)=2σ4(n−1)n2 ? how read it ? why so ?

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM