Frequentist vs. Bayesian Statistics

A concrete 5-minute example to compare & contrast frequentist and Bayesian approaches.
It is assumed the reader is comfortable with Bayes's rule and marginal likelihood (and $\href{https://en.wikipedia.org/wiki/Arg_max}{\argmax}$).
(Simplified from here to avoid confusing the reader with conjugate priors. More reading here.)

The Problem

A coin comes up heads with unknown probability $p$. We toss it and observe $h$ heads and $t$ tails.
Given these previous tosses, what is the probability of getting $h'$ more heads ($\Pr{h' \given h, t}$)?

The Problem with the Problem

Obviously, the answer for the base case ($h = t = 0$) is $p^{h'}$, so we need to know something about $p$!
Either we must change the question to something more manageable, or just assume a “prior” $\textcolor{orange}{\Pr{p}}$ for $p$.
Our assumption about the prior is what makes Bayesian approaches subjective.

Frequentist (point estimate) approach: Maximum Likelihood (ML) solution

Use the single most likely value of $p$ to estimate the answer: \begin{align*} \Pr{h' \given h, t} &\textcolor{red}{\approx} \p{\textcolor{green}{\argmax_{p}\,\Pr{h, t \given p}}}^{h'} = \p{\argmax_{p}\,p^h \p{1 - p}^t}^{h'} = \p{\frac{h}{h + t}}^{h'} \end{align*} A couple of examples:

\begin{align*} \Pr{h' = \phantom{0}2 \given h = 10, t = 4} &\textcolor{red}{\approx} \p{10/\p{10+4}}^{2\phantom{0}} \approx \textcolor{red}{\mathbf{51.0}\%} \\ \Pr{h' = 20 \given h = 20, t = 1} &\textcolor{red}{\approx} \p{20/\p{20+1}}^{20} \approx \textcolor{red}{\mathbf{37.7}\%} \end{align*}

Pros: easy to compute; does not require any assumptions regarding $p$.
Cons: does not answer the original question; approximation error is unknown.

Semi-Bayesian (point estimate) approach: Maximum a Posteriori (MAP) solution

Use the single value of $p$ that maximizes the likelihood of the tosses to estimate the answer: \begin{align*} \Pr{h' \given h, t} \textcolor{red}{\approx} \p{\argmax_{p}\,\Pr{p \given h, t}}^{h'} &= \p{\textcolor{green}{\argmax_{p}} \textcolor{green}{\Pr{h, t \given p}} \textcolor{orange}{\Pr{p}} \div \cancelto{(\text{unaffected by }p)}{\Pr{h, t}}}^{h'} \\ \end{align*}

Pros: easy to compute; can account for a non-uniform prior $\textcolor{orange}{\Pr{p}}$
Cons: does not answer the original question; approximation error is unknown; requires $\textcolor{orange}{\Pr{p}}$.

Notice that if all values of $p$ are equally likely ($\textcolor{orange}{\Pr{p}} = 1$), then this is the same as the ML solution.

Fully Bayesian (exact) approach: “Bayesian” solution

If we already know $\textcolor{orange}{\Pr{p}}$, then we can just find the exact solution directly.
There is no need to estimate a single value of $p$ when we can consider all possible values: \begin{align*} \Pr{h' \given h, t} &= \int_{0}^{1} \Pr{h' \given p, \cancel{h, t}} \times \,\Pr{p \given h, t}\,dp && \text{(marginalize)} \\ &= \int_{0}^{1} p^{h'} \times \,\frac{\Pr{h, t \given p} \textcolor{orange}{\Pr{p}}}{\Pr{h, t}}\,dp && \text{(Bayes's rule)} \\ &= \frac{\int_{0}^{1} p^{h'} \Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp}{\Pr{h, t}} \\ &= \frac{\int_{0}^{1} p^{h'} \Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp}{\int_{0}^{1}\hspace{1.35em}\Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp} && \text{(}\href{https://en.wikipedia.org/wiki/Law_of_total_probability}{\text{law of total probability}}\text{)} \\ &= \frac{\int_{0}^{1} p^{h'} \cancel{\frac{\p{h + t}!}{h!\,t!}} p^h \p{1 - p}^t \textcolor{orange}{\Pr{p}}\,dp}{\int_{0}^{1}\hspace{1.35em}\cancel{\frac{\p{h + t}!}{h!\,t!}} p^h \p{1 - p}^t \textcolor{orange}{\Pr{p}}\,dp} \\ \therefore\ \Pr{h' \given h, t} &= \frac{\p{h + h'}!\,\p{h + t + 1}!}{h!\,\p{h + t + 1 + h'}!} \div \frac{\cancel{\p{h + 0}!}\,\cancel{\p{h + t + 1}!}}{\cancel{h!}\,\cancel{\p{h + t + 1 + 0}!}} && \p{\text{if } \textcolor{orange}{\Pr{p}} = 1} \end{align*} In contrast with previous examples, we find that:

\begin{align*} \Pr{h' = \phantom{0}2 \given h = 10, t = 4} &\textcolor{blue}{=} 33/68 \approx \textcolor{blue}{\mathbf{48.5}\%} < \textcolor{red}{\mathbf{51.0}\%} \\ \Pr{h' = 20 \given h = 20, t = 1} &\textcolor{blue}{=} 11/41 \approx \textcolor{blue}{\mathbf{26.8}\%} < \textcolor{red}{\mathbf{37.7}\%} \end{align*}

Conclusion

To obtain exact (Bayesian) solutions, we require extra assumptions regarding the parameters.
Frequentist methods avoid adding extra assumptions by allowing for errors in their solutions.