Bayesian and frequentist reasoning in plain English
How would you describe in plain English the characteristics that distinguish Bayesian from Frequentist reasoning?
This question about drawing inferences about an individual bowl player when you have two data sets - other players' results, and the new player's results, is a good spontaneous example of the difference which my answer tries to address in plain English.
Here is how I would explain the basic difference to my grandma:
I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.
Problem: Which area of my home should I search?
I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.
I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.
I like the analogy. I would find it very useful if there were a defined question (based on a dataset) in which an answer was derived using frequentist reasoning and an answer was derived using Bayesian - preferably with R script to handle both reasonings. Am I asking too much?
The simplest thing that I can think of that tossing a coin n times and estimating the probability of a heads (denote by p). Suppose, we observe k heads. Then the probability of getting k heads is: P (k heads in n trials) = (n, k) p^k (1-p)^(n-k) Frequentist inference would maximize the above to arrive at an estimate of p = k / n. Bayesian would say: Hey, I know that p ~ Beta(1,1) (which is equivalent to assuming that p is uniform on [0,1]). So, the updated inference would be: p ~ Beta(1+k,1+n-k) and thus the bayesian estimate of p would be p = 1+k / (2+n) I do not know R, sorry.
It should be pointed out that, from the frequentists point of view, there is no reason that you can't incorporate the prior knowledge *into* the model. In this sense, the frequentist view is simpler, you only have a model and some data. There is no need to separate the prior information from the model.
@user28 As a comment on your comment, if $n = 3$, then the frequentist would estimate $p = 0$ (respectively $p = 1$) upon seeing a result of $k = 0$ heads (respectively $k = 3$ heads), i.e., the coin is two-headed or two-tailed. The Bayesian estimates $1/5$ and $4/5$ respectively do allow for the possibility that it is a somewhat less biased coin.
@Farrel - the recent question at http://stats.stackexchange.com/questions/21439/estimating-probability-of-success-given-a-reference-population/21466#21466 and my answer in two parts (unintentionally) is a nice simple example of this. It would be fairly easy to knock together an example dataset and R script showing the two approaches.
@user28 - when you say "I do not know R", what are you referring to with the letter 'R'?
I recently posted a question about a Bayesian example that is similar to this one but slightly different in an interesting way. A fisherman is lost at sea and the Coast Guard searches for him with a model and then updates the model as they conduct searches... so they are combining what their model predicts based on ocean currents/wind with the "prior information" of where they already searched. http://stats.stackexchange.com/q/119952/25734
The example is nice but should really start at the beginning; Suppose you have no data ("no beeps"), could you make a probabilistic inference? Yes, you can, says the Bayesian, because you have prior knowledge about where you usually leave your phone (very likely) - but no, you cannot if you are a frequentist, since only data are random. -- It is here (I find) that one sees the "beauty" and consistency of Bayesian reasoning, because probabilistic inference without new data IS natural and the Bayesian nicely integrates how new data (beeps) should influence the inference.
As was commented already in 2010, from the frequentists point of view, there is no reason that you can't incorporate the prior knowledge into the model. Here an example of explicitly using informative priors in ferquentist reasoning: Using prior knowledge in frequentist tests. figshare. https://doi.org/10.6084/m9.figshare.4819597.v3 See also alternative definitions in other answers below.
It should be mentioned that which model is chosen depends on the application. If a strict accuracy is desired, without caring about side effects, then Bayesian model should be used. If you want to keep a 'fair' distribution when taking actions based on inferencing, then the frequentist model may be necessary. For instance, it may be true that taking into account race for determining someone's guilt of a crime may increase accuracy in a probabilistic guess, it also creates greater false negatives and false positives for different races. Effects from actions taken will effect the distribution.
Tongue firmly in cheek:
A Bayesian defines a "probability" in exactly the same way that most non-statisticians do - namely an indication of the plausibility of a proposition or a situation. If you ask him a question, he will give you a direct answer assigning probabilities describing the plausibilities of the possible outcomes for the particular situation (and state his prior assumptions).
A Frequentist is someone that believes probabilities represent long run frequencies with which events occur; if needs be, he will invent a fictitious population from which your particular situation could be considered a random sample so that he can meaningfully talk about long run frequencies. If you ask him a question about a particular situation, he will not give a direct answer, but instead make a statement about this (possibly imaginary) population. Many non-frequentist statisticians will be easily confused by the answer and interpret it as Bayesian probability about the particular situation.
However, it is important to note that most Frequentist methods have a Bayesian equivalent that in most circumstances will give essentially the same result, the difference is largely a matter of philosophy, and in practice it is a matter of "horses for courses".
As you may have guessed, I am a Bayesian and an engineer. ;o)
As a non-expert, I think that the key to the entire debate is that people actually reason like Bayesians. You have to be trained to think like a frequentist, and even then it's easy to slip up and either reason or present your reasoning as if it were Bayesian. "There's a 95% chance that the value is within this confidence interval." Enough said.
The key also is to think about what kind of lobbying has the statistics of the 20th century be called "classical" while the statistics that Laplace and Gauss have started to use in the 19th century are not...
Maybe I've been doing frequentist work too long, but I'm not so sure the Bayesian viewpoint is always intuitive. For example, suppose I am interested in a real world parameter of interest, such as average height of a population. If I tell you "there is a 95% chance the parameter of interest in the my credible interval", and then follow up with a question of "If we created 100 such intervals for different parameters, what proportion of them would we expect to contain the real values of the parameter?", the fact that the answer is **not** 95 must be confusing to some people.
@CliffAB but why would you ask the second question? The point is they are different questions, so it is unsurprising that they have different answers. The Baysian can answer both questions, but the answer may be different (which seems reasonable to me). The frequentist can only answer one of the questions (due to the restrictive definition of probability) and hence (implicitly) uses the same answer for both questions, which is what causes the problems. A credible interval is not a confidence interval, but a Bayesian can construct *both* a credible interval and a confidence interval.
My comment was in response to Wayne's; the idea that people "naturally" think in a Bayesian context, as it's easier to interpret a credible interval. My point is that while it's simpler to construct the right interpretation of a credible interval (i.e. less of a word soup), I think the non-statistician is just as likely to be confused about what that *really* means.
@CliffAB, ah, I sort of see, however if you think that way because you have been doing frequentist work too long, that rather suggests that your mode of thinking about probability is not your natural one, but one that you have learned and become accustomed to. I don't think I agree with the conclusion, the usual misunderstanding of a confidence interval is precisely that of interpreting it as a credible interval, i.e. an interval likely to contain the true value with a given confidence. Likewise the p-value fallacy arises from interpreting a frequentist test in a Bayesian way.
My point is more that I think only a statistician (Frequentist or Bayesian) would think that the statements "if X happens 1 in 100 times, the probability X happens is 1/100" and "probability is a measure of uncertainty" are not compatible statements. So it gets really tough to say one interpretation is more "natural" than the other. I think most people don't naturally think there is a distinction.
In fact, you don't have to look to far to find a hardcore (the hardest core?) Bayesian accidentally saying a Frequentist view makes it easier to understand what "random" means.
@CliffAB the two statements are not incompatible within a Bayesian definition ( along run frequency is a reasonable way of representing an uncertain degree of belief where a long run frequency exists), but they are under a frequentist one, which rather makes my point that the Bayesian framework is more natural. Note I am a "horses for courses" man, so there are some things more easily explained in frequentist terms than Bayesian ones.
Very crudely I would say that:
Frequentist: Sampling is infinite and decision rules can be sharp. Data are a repeatable random sample - there is a frequency. Underlying parameters are fixed i.e. they remain constant during this repeatable sampling process.
Bayesian: Unknown quantities are treated probabilistically and the state of the world can always be updated. Data are observed from the realised sample. Parameters are unknown and described probabilistically. It is the data which are fixed.
There is a brilliant blog post which gives an indepth example of how a Bayesian and Frequentist would tackle the same problem. Why not answer the problem for yourself and then check?
The problem (taken from Panos Ipeirotis' blog):
You have a coin that when flipped ends up head with probability $p$ and ends up tail with probability $1-p$. (The value of $p$ is unknown.)
Trying to estimate $p$, you flip the coin 100 times. It ends up head 71 times.
Then you have to decide on the following event: "In the next two tosses we will get two heads in a row."
Would you bet that the event will happen or that it will not happen?
Since $0.71^2=0.5041$, I would regard this as close enough to an even bet to be prepared to go modestly either way just for fun (and to ignore any issues over the shape of the prior). I sometimes buy insurance and lottery tickets with far worse odds.
At the end of that blog post it says "instead of using the uniform distribution as a prior, we can be even more agnostic. In this case, we can use the Beta(0,0) distribution as a prior. Such a distribution corresponds to the case where any mean of the distribution is equally likely. In this case, the two approaches, Bayesian and frequentist give the same results." which kind of sums it up really!
The *big* problem with that blog post is it does not adequately characterize what a non-Bayesian (but rational) decision maker would do. It's little more than a straw man.
@tdc: the Bayesian (Jeffreys) prior is Beta(0.5, 0.5) and some would say that it is the only justifiable prior.
Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.
The Frequentist would say that each outcome has an equal 1 in 6 chance of occurring. She views probability as being derived from long run frequency distributions.
The Bayesian however would say hang on a second, I know that man, he's David Blaine, a famous trickster! I have a feeling he's up to something. I'm going to say that there's only a 1% chance of it landing on a 3 BUT I'll re-evaluate that beliefe and change it the more times he rolls the die. If I see the other numbers come up equally often, then I'll iteratively increase the chance from 1% to something slightly higher, otherwise I'll reduce it even further. She views probability as degrees of belief in a proposition.
I think the frequentist would (verbosely) point out his assumptions and would avoid making any useful prediction. Maybe he'd say, "Assuming the die is fair, each outcome has an equal 1 in 6 chance of occurring. Furthermore, if the die rolls are fair and David Blaine rolls the die 17 times, there is only a 5% chance that it will never land on 3, so such an outcome would make me doubt that the die is fair."
Just a little bit of fun...
A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.
From this site:
and from the same site, a nice essay...
"An Intuitive Explanation of Bayes' Theorem"
In which case, the wouldn't the frequentist be one who knows the ratio of donkey, mule and horse populations, and upon observing a pack of mules starts to calculate the p-value to know as to whether there has been a statistically significant increase in the population ratio of mules.
The Bayesian is asked to make bets, which may include anything from which fly will crawl up a wall faster to which medicine will save most lives, or which prisoners should go to jail. He has a big box with a handle. He knows that if he puts absolutely everything he knows into the box, including his personal opinion, and turns the handle, it will make the best possible decision for him.
The frequentist is asked to write reports. He has a big black book of rules. If the situation he is asked to make a report on is covered by his rulebook, he can follow the rules and write a report so carefully worded that it is wrong, at worst, one time in 100 (or one time in 20, or one time in whatever the specification for his report says).
The frequentist knows (because he has written reports on it) that the Bayesian sometimes makes bets that, in the worst case, when his personal opinion is wrong, could turn out badly. The frequentist also knows (for the same reason) that if he bets against the Bayesian every time he differs from him, then, over the long run, he will lose.
In plain english, I would say that Bayesian and Frequentist reasoning are distinguished by two different ways of answering the question:
What is probability?
Most differences will essentially boil down to how each answers this question, for it basically defines the domain of valid applications of the theory. Now you can't really give either answer in terms of "plain english", without further generating more questions. For me the answer is (as you could probably guess)
probability is logic
my "non-plain english" reason for this is that the calculus of propositions is a special case of the calculus of probabilities, if we represent truth by $1$ and falsehood by $0$. Additionally, the calculus of probabilities can be derived from the calculus of propositions. This conforms with the "bayesian" reasoning most closely - although it also extends the bayesian reasoning in applications by providing principles to assign probabilities, in addition to principles to manipulate them. Of course, this leads to the follow up question "what is logic?" for me, the closest thing I could give as an answer to this question is "logic is the common sense judgements of a rational person, with a given set of assumptions" (what is a rational person? etc. etc.). Logic has all the same features that Bayesian reasoning has. For example, logic does not tell you what to assume or what is "absolutely true". It only tells you how the truth of one proposition is related to the truth of another one. You always have to supply a logical system with "axioms" for it to get started on the conclusions. They also has the same limitations in that you can get arbitrary results from contradictory axioms. But "axioms" are nothing but prior probabilities which have been set to $1$. For me, to reject Bayesian reasoning is to reject logic. For if you accept logic, then because Bayesian reasoning "logically flows from logic" (how's that for plain english :P ), you must also accept Bayesian reasoning.
For the frequentist reasoning, we have the answer:
probability is frequency
although I'm not sure "frequency" is a plain english term in the way it is used here - perhaps "proportion" is a better word. I wanted to add into the frequentist answer that the probability of an event is thought to be a real, measurable (observable?) quantity, which exists independently of the person/object who is calculating it. But I couldn't do this in a "plain english" way.
So perhaps a "plain english" version of one the difference could be that frequentist reasoning is an attempt at reasoning from "absolute" probabilities, whereas bayesian reasoning is an attempt at reasoning from "relative" probabilities.
Another difference is that frequentist foundations are more vague in how you translate the real world problem into the abstract mathematics of the theory. A good example is the use of "random variables" in the theory - they have a precise definition in the abstract world of mathematics, but there is no unambiguous procedure one can use to decide if some observed quantity is or isn't a "random variable".
The bayesian way of reasoning, the notion of a "random variable" is not necessary. A probability distribution is assigned to a quantity because it is unknown - which means that it cannot be deduced logically from the information we have. This provides at once a simple connection between the observable quantity and the theory - as "being unknown" is unambiguous.
You can also see in the above example a further difference in these two ways of thinking - "random" vs "unknown". "randomness" is phrased in such a way that the "randomness" seems like it is a property of the actual quantity. Conversely, "being unknown" depends on which person you are asking about that quantity - hence it is a property of the statistician doing the analysis. This gives rise to the "objective" versus "subjective" adjectives often attached to each theory. It is easy to show that "randomness" cannot be a property of some standard examples, by simply asking two frequentists who are given different information about the same quantity to decide if its "random". One is the usual Bernoulli Urn: frequentist 1 is blindfolded while drawing, whereas frequentist 2 is standing over the urn, watching frequentist 1 draw the balls from the urn. If the declaration of "randomness" is a property of the balls in the urn, then it cannot depend on the different knowledge of frequentist 1 and 2 - and hence the two frequentist should give the same declaration of "random" or "not random".
I'd be interested if you could rewrite this without the reference to common sense.
@PeterEllis - What's wrong with common sense? We all have it, and it is usually foolish not to use it...
It's too contested what it actually is, and too culturally specific. "Common sense" is short hand for whatever is the perceived sensible way of doing things in this particular culture (which all too often looks far from sensible to another culture in time and space), so referring to it in a definition ducks the key questions. It's particularly unhelpful as part of a definition of logic (and so, I would argue, is the concept of a "rational person" in that particular context - particularly as I am guessing your definition of a "rational person" would be a logical person who has common sense!)
I fail to understand why using common sense is bad. using your definition of it, why would we not want to do what is sensible at the time? And what is the "key questions" that are being dodged? you say common sense has no well defined meaning, and then go and provide one!
He can't provide one, his argument is that _there is no universal definition_, only culturally-specific ones. Two people from different cultural backgrounds (and that includes different styles of statistical education) will quite possibly have two different understandings of what is sensible to do in a given situations.
This answer has nuggets of goodness (how's that for plain English?), but I don't believe (how's that for being a Bayesian!) that the following statement is true: "For if you accept logic... you must also accept Bayesian reasoning". For instance, if you think instead of translating the abstract theory of the mathematics into the real world, you'll find that the axiomatic approach can be consistent with both Frequentist and Bayesian reasoning! Arguably, Kolmogorov in the first case, and, say, Jeffreys in the second. In essence, it's the theory of probability that's logic; not its interpretation.
I like the last 3 paragraph, especially the last paragraph. This is also how I eventually convince myself that probability is a subjective belief, i.e., different people have different information set about an event and hence, they may have different belief about the likelihood of the occurrence of that event. I used that same example, the ball draw from urn with two people
In reality, I think much of the philosophy surrounding the issue is just grandstanding. That's not to dismiss the debate, but it is a word of caution. Sometimes, practical matters take priority - I'll give an example below.
Also, you could just as easily argue that there are more than two approaches:
- Neyman-Pearson ('frequentist')
- Likelihood-based approaches
- Fully Bayesian
A senior colleague recently reminded me that "many people in common language talk about frequentist and Bayesian. I think a more valid distinction is likelihood-based and frequentist. Both maximum likelihood and Bayesian methods adhere to the likelihood principle whereas frequentist methods don't."
I'll start off with a very simple practical example:
We have a patient. The patient is either healthy(H) or sick(S). We will perform a test on the patient, and the result will either be Positive(+) or Negative(-). If the patient is sick, they will always get a Positive result. We'll call this the correct(C) result and say that $$ P(+ | S ) = 1 $$ or $$ P(Correct | S) = 1 $$ If the patient is healthy, the test will be negative 95% of the time, but there will be some false positives. $$ P(- | H) = 0.95 $$ $$ P(+ | H) = 0.05 $$ In other works, the probability of the test being Correct, for Healthy people, is 95%.
So, the test is either 100% accurate or 95% accurate, depending on whether the patient is healthy or sick. Taken together, this means the test is at least 95% accurate.
So far so good. Those are the statements that would be make by a frequentist. Those statements are quite simple to understand and are true. There's no need to waffle about a 'frequentist interpretation'.
But, things get interesting when you try to turn things around. Given the test result, what can you learn about the health of the patient? Given a negative test result, the patient is obviously healthy, as there are no false negatives.
But we must also consider the case where the test is positive. Was the test positive because the patient was actually sick, or was it a false positive? This is where the frequentist and Bayesian diverge. Everybody will agree that this cannot be answered at the moment. The frequentist will refuse to answer. The Bayesian will be prepared to give you an answer, but you'll have to give the Bayesian a prior first - i.e. tell it what proportion of the patients are sick.
To recap, the following statements are true:
- For healthy patients, the test is very accurate.
- For sick patients, the test is very accurate.
If you are satisfied with statements such as that, then you are using frequentist interpretations. This might change from project to project, depending on what sort of problems you're looking at.
But you might want to make different statements and answer the following question:
- For those patients that got a positive test result, how accurate is the test?
This requires a prior and a Bayesian approach. Note also that this is the only question of interest to the doctor. The doctor will say "I know that the patients will either get a positive result or a negative result. I also now that the negative result means the patient is healthy and can be send home. The only patients that interest me now are those that got a positive result -- are they sick?."
To summarize: In examples such as this, the Bayesian will agree with everything said by the frequentist. But the Bayesian will argue that the frequentist's statements, while true, are not very useful; and will argue that the useful questions can only be answered with a prior.
A frequentist will consider each possible value of the parameter (H or S) in turn and ask "if the parameter is equal to this value, what is the probability of my test being correct?"
A Bayesian will instead consider each possible observed value (+ or -) in turn and ask "If I imagine I have just observed that value, what does that tell me about the conditional probability of H-versus-S?"
Do you mean `For sick patients, the test is NOT very accurate.` you forget the NOT?
It's very accurate in both cases, so no I did not forget a word. For healthy people, the result will be correct (i.e. 'Negative') 95% of the time. And for sick people, the result will be correct (i.e. 'Positive') 95% of the time.
I think the "weakness" in maximum likelihood is that it assumes a uniform prior on the data whereas "full Bayesian" is more flexible in what prior you can choose.
To complete the example, suppose 0.1% of the population is sick with disease D that we're testing for: this is not our prior. More likely, something like 30% of patients who come to the doctor and have symptoms matching D actually have D (this could be more or less depending on details such as how often a different sickness presents with the same symptoms). So 70% of those taking the test are healthy, 66.5% get a negative result, and 30%/33.5% are sick. So given a positive result, our posterior probability that a patient is sick is 89.6%. Next puzzle: how did we know 70% of test-takers have D?
Bayesian and frequentist statistics are compatible in that they can be understood as two limiting cases of assessing the probability of future events based on past events and an assumed model, if one admits that in the limit of a very large number of observations, no uncertainty about the system remains, and that in this sense a very large number of observations is equal to knowing the parameters of the model.
Assume we have made some observations, e.g., outcome of 10 coin flips. In Bayesian statistics, you start from what you have observed and then you assess the probability of future observations or model parameters. In frequentist statistics, you start from an idea (hypothesis) of what is true by assuming scenarios of a large number of observations that have been made, e.g., coin is unbiased and gives 50% heads up, if you throw it many many times. Based on these scenarios of a large number of observations (=hypothesis), you assess the frequency of making observations like the one you did, i.e.,frequency of different outcomes of 10 coin flips. It is only then that you take your actual outcome, compare it to the frequency of possible outcomes, and decide whether the outcome belongs to those that are expected to occur with high frequency. If this is the case you conclude that the observation made does not contradict your scenarios (=hypothesis). Otherwise, you conclude that the observation made is incompatible with your scenarios, and you reject the hypothesis.
Thus Bayesian statistics starts from what has been observed and assesses possible future outcomes. Frequentist statistics starts with an abstract experiment of what would be observed if one assumes something, and only then compares the outcomes of the abstract experiment with what was actually observed. Otherwise the two approaches are compatible. They both assess the probability of future observations based on some observations made or hypothesized.
I started to write this up in a more formal way:
Positioning Bayesian inference as a particular application of frequentist inference and vice versa. figshare.
The manuscript is new. If you happen to read it, and have comments, please let me know.
I would say that they look at probability in different ways. The Bayesian is subjective and uses a priori beliefs to define a prior probability distribution on the possible values of the unknown parameters. So he relies on a theory of probability like deFinetti's. The frequentist see probability as something that has to do with a limiting frequency based on an observed proportion. This is in line with the theory of probability as developed by Kolmogorov and von Mises.
A frequentist does parametric inference using just the likelihood function. A Bayesian takes that and multiplies to by a prior and normalizes it to get the posterior distribution that he uses for inference.
+1 Good answer, but it ought to be emphasized that the Bayesian approach and Frequency approach differ with respect to their _interpretation_ of probability. Kolmogorov, on the other hand, provides an _axiomatic foundation_ for the theory of probability, which _does not require an interpretation_ (!) like those employed by the Bayesian or Frequentist. In a sense, the axiomatic system has a life of its own! From Kolmogorov's six axioms alone, I don't think it's possible to say that his axiomatic system is either Bayesian or Frequentist, and, could, in fact, be consistent with both.