Decision Theory

Random collection of decision theory basics.

🧮 math
Table of Contents

Consider a country that is deciding whether to buy a vaccine a1a_1 or wait for probably a better one in the pipeline a2a_2. Let’s say the efficiacy of the vaccine in question is ww. The country might determine the “loss” of taking action as,

(w,a)={10(1w),if a=BUY100,otherwise\ell(w, a) = \begin{cases} 10(1-w), \quad \text{if } a = \textrm{BUY} \\ 100, \quad \text{otherwise} \end{cases}

In statistical inference, the goal is not to make a decision but to provide the summary of statistical evidence. This would be the task of first figuring out θ\theta. Based on that statistical summary, we would want a decision.

Decision theory combines the statistical knowledge gained from information in the samples with other relevant aspects of the problem to make the best decision.

  1. Knowledge of possible consequences (quantified in the loss function)
  2. Prior information

The Bayesian expected loss of taking an action aa is under the posterior,

ρ(π,a)=Eπ[(θ,a)]\rho(\pi^\star, a) = \mathbb{E}_{\pi^\star}\left[\ell(\theta, a)\right]

A frequentist decision-theorist seeks to evaluate risk for every θ\theta and a decision rule δ(x)\delta(x) (which directly gives us an action in the no-data case) as

R(θ,δ)=EX[(θ,δ(x))]R(\theta,\delta) = \mathbb{E}_{X} \left[\ell(\theta, \delta(x))\right]

So for a problem with no-data, R(θ,δ)=(θ,δ)R(\theta, \delta) = \ell(\theta, \delta). The Bayes risk is then just

r(π,δ)=Eπ[R(θ,δ)]r(\pi,\delta) = \mathbb{E}_\pi\left[ R(\theta, \delta) \right]

Regarding randomized decision functions, leaving decisions up to chance seems ridiculous in practice. We will rarely use a randomized rule. But is often a useful tool for analysis.

Decision Principles

The Conditional Bayes Principle: Pick a Bayes action aa which minimizes ρ\rho.

a=argmina{ρ(π,a)=Eπ[(θ,a)]}a^\star = \textrm{arg}\min_{a} \left\{\rho(\pi^\star, a) = \mathbb{E}_{\pi^\star}\left[\ell(\theta, a)\right]\right\}

Frequentist Decision Principles:1 Now these are hard to reason about because we can have many non-dominating decision rules. Risk functions to pick a decision rule is hard in practice. There are more principles to guide the choice.

  1. Bayes Risk: This is a single number, so we just pick the decision rule that. δπ=argminδ{r(π,δ)=Eπ[R(θ,δ)]}\delta^\star_\pi = \textrm{arg}\min_{\delta} \left\{ r(\pi,\delta) = \mathbb{E}_\pi\left[ R(\theta, \delta) \right] \right\}
  2. Minimax: supθΘR(θ,δ)\sup_{\theta \in \Theta} R(\theta, \delta^\star), through a randomized decision rule. This is a worst-case rule. infδsupθ{R(θ,δ)=EX[(θ,δ(x))]}\inf_{\delta} \sup_{\theta} \left\{ R(\theta,\delta) = \mathbb{E}_{X} \left[\ell(\theta, \delta(x))\right] \right\}
  3. Invariance

This is similar to other frequentist principles for inference: like maximum likelihood estimators, unbiasedness, minimum variance, and lease squares risk.

Use points from 4.1 of Berger.1

Bayesian Hypothesis Testing is straightforward. Given two hypotheses, simply compute the Bayes factor: posterior odds ratio.

One-sided hypothesis testing: p-values sometimes have a Bayesian interpretation. Consider testing H0=θθ0H_0 = \theta \leq \theta_0 and H1=θ>θ0H_1 = \theta \gt \theta_0.

w~(θx,x,y)=p~(θx,x,y)p~(θx,y)=p(yx,θ)p(θx,x)p(yx)p(θ)=p(θx,x)p(θ)\begin{aligned} \widetilde{w}(\theta \mid x^\star,\mathbf{x},\mathbf{y}) &= \frac{\widetilde{p}(\theta \mid x^\star,\mathbf{x},\mathbf{y})}{\widetilde{p}(\theta \mid \mathbf{x},\mathbf{y})} = \frac{p(\mathbf{y} \mid \mathbf{x}, \theta)p(\theta \mid \mathbf{x},x^\star)}{p(\mathbf{y}\mid \mathbf{x})p(\theta)} = \frac{p(\theta \mid \mathbf{x},x^\star)}{p(\theta)} \end{aligned}

Footnotes

  1. James O. Berger. “Statistical Decision Theory and Bayesian Analysis.” (1988). https://www.jstor.org/stable/2288950 2