### Help me understand Bayesian prior and posterior distributions

In a group of students, there are 2 out of 18 that are left-handed. Find the posterior distribution of left-handed students in the population assuming uninformative prior. Summarize the results. According to the literature 5-20% of people are left-handed. Take this information into account in your prior and calculate new posterior.

I know the

*beta distribution*should be used here. First, with $\alpha$ and $\beta$ values as 1? The equation I found in the material for posterior is$$\pi(r \vert Y ) \propto r^{(Y +−1)} \times (1 − r)^{(N−Y +−1)} \\ $$

$Y=2$, $N=18$

Why is that $r$ in the equation? ($r$ denoting the proportion of left-handed people). It is unknown, so how can it be in this equation? To me it seems ridiculous to calculate $r$ given $Y$ and use that $r$ in the equation giving $r$. Well, with the sample $r=2/18$ the result was $0,0019$. The $f$ should I deduce from that?

The equation giving an expected value of $R$ given known $Y$ and $N$ worked better and gave me $0,15$ which sounds about right. The equation being $E(r | X, N, α, β) = (α + X)/(α + β + N)$ with value $1$ assigned to $α$ and $β$. What values should I give $α$ and $β$ to take into account prior information?

Some tips would be much appreciated. A general lecture on prior and posterior distributions wouldn't hurt either (I have vague understanding what they are but only vague) Also bear in mind I'm not very advanced statistician (actually I'm a political scientist by my main trade) so advanced mathematics will probably fly over my head.

The phrase "*Find the posterior distribution of left-handed students*" makes no sense. Random variables have distributions, and "left handed students" isn't a r.v. I presume you intend "*Find the posterior distribution of* **the proportion of** *left-handed students*". It's important not to gloss over such details, but to be clear about what you're actually talking about.

Actually, reading your question it seems to me that your problem isn't so much Bayesian stats as simply understanding probability distributions; it's *always the case* that the argument of a distribution function (or a probability function as you have there) is a function of an unknown (the random variable). *That's entirely the point of them.*

Comments are not for extended discussion; this conversation has been moved to chat.

Let me first explain what a conjugate prior is. I will then explain the Bayesian analyses using your specific example. Bayesian statistics involve the following steps:

- Define the
*prior distribution*that incorporates your subjective beliefs about a parameter (in your example the parameter of interest is the proportion of left-handers). The prior can be "uninformative" or "informative" (but there is no prior that has no information, see the discussion here). - Gather data.
- Update your prior distribution with the data using Bayes' theorem to obtain a
*posterior distribution.*The posterior distribution is a probability distribution that represents your updated beliefs about the parameter after having seen the data. - Analyze the posterior distribution and summarize it (mean, median, sd, quantiles, ...).

The basis of all bayesian statistics is Bayes' theorem, which is

$$ \mathrm{posterior} \propto \mathrm{prior} \times \mathrm{likelihood} $$

In your case, the likelihood is binomial. If the prior and the posterior distribution are in the

*same family,*the prior and posterior are called*conjugate*distributions. The beta distribution is a conjugate prior because the posterior is also a beta distribution. We say that the beta distribution is the conjugate family for the binomial likelihood. Conjugate analyses are convenient but rarely occur in real-world problems. In most cases, the posterior distribution has to be found numerically via MCMC (using Stan, WinBUGS, OpenBUGS, JAGS, PyMC or some other program).If the prior probability distribution does not integrate to 1, it is called an

*improper*prior, if it does integrate to 1 it is called a*proper*prior. In most cases, an improper prior does not pose a major problem for Bayesian analyses. The posterior distribution*must*be proper though, i.e. the posterior must integrate to 1.These rules of thumb follow directly from the nature of the Bayesian analysis procedure:

- If the prior is uninformative, the posterior is very much determined by the data (the posterior is data-driven)
- If the prior is informative, the posterior is a mixture of the prior and the data
- The more informative the prior, the more data you need to "change" your beliefs, so to speak because the posterior is very much driven by the prior information
- If you have a lot of data, the data will dominate the posterior distribution (they will overwhelm the prior)

An excellent overview of some possible "informative" and "uninformative" priors for the beta distribution can be found in this post.

Say your prior beta is $\mathrm{Beta}(\pi_{LH}| \alpha, \beta)$ where $\pi_{LH}$ is the proportion of left-handers. To specify the prior parameters $\alpha$ and $\beta$, it is useful to know the mean and variance of the beta distribution (for example, if you want your prior to have a certain mean and variance). The mean is $\bar{\pi}_{LH}=\alpha/(\alpha + \beta)$. Thus, whenever $\alpha =\beta$, the mean is $0.5$. The variance of the beta distribution is $\frac{\alpha\beta}{(\alpha + \beta)^{2}(\alpha + \beta + 1)}$. Now, the convenient thing is that you can think of $\alpha$ and $\beta$ as previously observed (pseudo-)data, namely $\alpha$ left-handers and $\beta$ right-handers out of a (pseudo-)sample of size $n_{eq}=\alpha + \beta$. The $\mathrm{Beta}(\pi_{LH} |\alpha=1, \beta=1)$ distribution is the uniform (all values of $\pi_{LH}$ are equally probable) and is the equivalent of having observed two people out of which one is left-handed and one is right-handed.

The posterior beta distribution is simply $\mathrm{Beta}(z + \alpha, N - z +\beta)$ where $N$ is the size of the sample and $z$ is the number of left-handers in the sample. The posterior mean of $\pi_{LH}$ is therefore $(z + \alpha)/(N + \alpha + \beta)$. So to find the parameters of the posterior beta distribution, we simply add $z$ left-handers to $\alpha$ and $N-z$ right-handers to $\beta$. The posterior variance is $\frac{(z+\alpha)(N-z+\beta)}{(N+\alpha+\beta)^{2}(N + \alpha + \beta + 1)}$. Note that a highly informative prior also leads to a smaller variance of the posterior distribution (the graphs below illustrate the point nicely).

In your case, $z=2$ and $N=18$ and your prior is the uniform which is uninformative, so $\alpha = \beta = 1$. Your posterior distribution is therefore $Beta(3, 17)$. The posterior mean is $\bar{\pi}_{LH}=3/(3+17)=0.15$. Here is a graph that shows the prior, the likelihood of the data and the posterior

You see that because your prior distribution is uninformative, your posterior distribution is entirely driven by the data. Also plotted is the highest density interval (HDI) for the posterior distribution. Imagine that you put your posterior distribution in a 2D-basin and start to fill in water until 95% of the distribution are above the waterline. The points where the waterline intersects with the posterior distribution constitute the 95%-HDI. Every point inside the HDI has a higher probability than any point outside it. Also, the HDI always includes the peak of the posterior distribution (i.e. the mode). The HDI is different from an equal tailed 95% credible interval where 2.5% from each tail of the posterior are excluded (see here).

For your second task, you're asked to incorporate the information that 5-20% of the population are left-handers into account. There are several ways of doing that. The easiest way is to say that the prior beta distribution should have a mean of $0.125$ which is the mean of $0.05$ and $0.2$. But how to choose $\alpha$ and $\beta$ of the prior beta distribution? First, you want your mean of the prior distribution to be $0.125$ out of a pseudo-sample of equivalent sample size $n_{eq}$. More generally, if you want your prior to have a mean $m$ with a pseudo-sample size $n_{eq}$, the corresponding $\alpha$ and $\beta$ values are: $\alpha = mn_{eq}$ and $\beta = (1-m)n_{eq}$. All you are left to do now is to choose the pseudo-sample size $n_{eq}$ which determines how confident you are about your prior information. Let's say you are very sure about your prior information and set $n_{eq}=1000$. The parameters of your prior distribution are thereore $\alpha = 0.125\cdot 1000 = 125$ and $\beta = (1 - 0.125)\cdot 1000 = 875$. The posterior distribution is $\mathrm{Beta}(127, 891)$ with a mean of about $0.125$ which is practically the same as the prior mean of $0.125$. The prior information is dominating the posterior (see the following graph):

If you are less sure about the prior information, you could set the $n_{eq}$ of your pseudo-sample to, say, $10$, which yields $\alpha=1.25$ and $\beta=8.75$ for your prior beta distribution. The posterior distribution is $\mathrm{Beta}(3.25, 24.75)$ with a mean of about $0.116$. The posterior mean is now near the mean of your data ($0.111$) because the data overwhelm the prior. Here is the graph showing the situation:

A more advanced method of incorporating the prior information would be to say that the $0.025$ quantile of your prior beta distribution should be about $0.05$ and the $0.975$ quantile should be about $0.2$. This is equivalent of saying that your are 95% sure that the proportion of left-handers in the population lies between 5% and 20%. The function

`beta.select`

in the R package`LearnBayes`

calculates the corresponding $\alpha$ and $\beta$ values of a beta distribution corresponding to such quantiles. The code is`library(LearnBayes) quantile1=list(p=.025, x=0.05) # the 2.5% quantile should be 0.05 quantile2=list(p=.975, x=0.2) # the 97.5% quantile should be 0.2 beta.select(quantile1, quantile2) [1] 7.61 59.13`

It seems that a beta distribution with paramters $\alpha = 7.61$ and $\beta=59.13$ has the desired properties. The prior mean is $7.61/(7.61 + 59.13)\approx 0.114$ which is near the mean of your data ($0.111$). Again, this prior distribution incorporates the information of a pseudo-sample of an equivalent sample size of about $n_{eq}\approx 7.61+59.13 \approx 66.74$. The posterior distribution is $\mathrm{Beta}(9.61, 75.13)$ with a mean of $0.113$ which is comparable with the mean of the previous analysis using a highly informative $\mathrm{Beta}(125, 875)$ prior. Here is the corresponding graph:

See also this reference for a short but imho good overview of Bayesian reasoning and simple analysis. A longer introduction for conjugate analyses, especially for binomial data can be found here. A general introduction into Bayesian thinking can be found here. More slides concerning aspects of Baysian statistics are here.

Why do we choose Beta distribution here?

@Metallica The primary reason is that the Beta is the conjugate prior of the binomial distribution. This means that if we choose a Beta as prior, the posterior will also be Beta. Further reasons are that the Beta is between 0 and 1 and is very flexible. It includes the uniform, for example. But any proper distribution with support in $(0,1)$ can be used as prior. It's just that the posterior is more difficult to calculate.

Do you happen to still have the document for "Intro to Bayesian thinking"? The Dropbox link is dead.

@bs7280 I have updated the links. They should work again now.

If the graphs are plotted with R? Would you please add R codes to generate above graphs? They are really helpful. Thanks!

I thought an uninformative prior would be Jeffrey's prior $\alpha=\beta=\frac 1 2$... why do you think it is not the case?

@meduz Strictly speaking, there is no real "uninformative" prior. I would like to refer you to the excellent answer by Tim on this discussion.

This depends on your definition, but there is one definition of an "uninformative prior" which is well-defined: https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors

@COOLSerdash, please have a look at this question; https://math.stackexchange.com/questions/3528352/likelihood-prior-and-posterior-distribution-of-reed-frost-model

- Define the
A beta distribution with $\alpha$ = 1 and $\beta$ = 1 is the same as a uniform distribution. So it is in fact, uniformative. You're trying to find information about a parameter of a distribution (in this case, percentage of left handed people in a group of people). Bayes formula states:

$P(r|Y_{1,...,n})$ = $\frac{P(Y_{1,...,n}|r)*P(r)}{\int P(Y_{1,...,n}|\theta)*P(r)}$

which you pointed out is proportional to:

$P(r|Y_{1,...,n})$ $\propto$ $(Y_{1,...,n}|r)*P(r)$

So basically you're starting with your prior belief of the proportion of left handers in the group(P(r), which you're using a uniform dist for), then considering the data which you collect to inform your prior(a binomial in this case. either you're right or left handed, so $P(Y_{1,...,n}|r)$). A binomial distribution has a beta conjugate prior, which means that the posterior distribution $P(r|Y_{1,...n})$, the distribution of the paramter after considering the data is in the same family as the prior. r here is not unknown in the end. (and frankly it wasn't before collecting the data. we've got a pretty good idea of the proportion of left handers in society.) You've got both the prior distribution (your assumption of r) and you've collected data and put the two together. The posterior is your new assumption of the distribution of left handers after considering the data. So you take the likelihood of the data, and multiply it by a uniform. The expected value of a beta distribution (which is what the poster is) is $\frac{\alpha}{\alpha+\beta}$. So when you started, your assumption with $\alpha$=1 and $\beta$=1 was that the proportion of left handers in the world was $\frac{1}{2}$. Now you've collected data that has 2 lefties out of 18. You've calculated a posterior. (still a beta) Your $\alpha$ and $\beta$ values are now different, changing your idea of the proportion of lefties vs. righties. how has it changed?

In the first part of your question it asks you to define a suitable prior for "r". With the binomial data in hand it would be wise to choose a beta distribution. Because then the posterior will be a beta. The Uniform ditribution being a special case of beta, you can choose prior for "r" the Uniform disribution allowing every possible value of "r" to be equally probable.

In the second part you have provided with the information regarding the prior distribution "r".

With this in hand @COOLSerdash's answer will give you the proper directions.

Thank you for posting this question and COOLSerdash for providing a proper answer.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM

David Robinson 7 years ago

Did you take a look at this question and answer?