Bayesian inference — Stats for Finance Module 8 | LeadAfrik Public Economics Hub

Bayesian inference treats parameters as random variables and combines prior beliefs with observed data via Bayes' rule. For finance, the Bayesian framework formalises something every PM does intuitively: starting with views, updating them as new evidence arrives. Black-Litterman portfolio construction is Bayesian; modern ML risk models are Bayesian; the credibility-weighted credit spreads used in actuarial work are Bayesian.

Bayes' rule for parameters

math

p(θ | x) = p(x | θ) p(θ) / p(x)
         ∝ likelihood × prior

p(θ): prior — what you believed before seeing data.
p(x | θ): likelihood — same object MLE maximises.
p(θ | x): posterior — updated belief after data.
p(x): marginal likelihood / evidence — a normalising constant for parameter inference, but central for model comparison.

Conjugate priors — the magic shortcut

For certain likelihood-prior pairings the posterior has the same functional form as the prior, with updated parameters. Closed-form, no MCMC required.

Normal mean (variance known) + Normal prior → Normal posterior. The classic Bayesian-mean update.
Normal variance + inverse-gamma prior → inverse-gamma posterior.
Bernoulli/binomial + Beta prior → Beta posterior. The default-rate updating used in credit.
Poisson + Gamma prior → Gamma posterior. The actuarial credibility model.

Normal-normal update

Observe X₁, ..., Xₙ ~ N(μ, σ²) with σ² known. Prior μ ~ N(μ₀, τ²). Posterior:

math

μ | x ~ N(μ_post, τ²_post)
μ_post = (μ₀/τ² + n X̄/σ²) / (1/τ² + n/σ²)
1/τ²_post = 1/τ² + n/σ²

The posterior mean is a precision-weighted average of the prior mean and the sample mean. Precisions add. As n grows the prior is washed out; as τ² → ∞ (flat prior) the posterior mean converges to the MLE.

Shrinkage as a Bayesian operation

Every shrinkage estimator (James-Stein, Ledoit-Wolf, ridge regression) is approximately a Bayesian posterior mean for a particular prior. Bayesian thinking provides the theoretical justification for shrinkage rules that are otherwise ad hoc.

MCMC — when no conjugate prior helps

Markov Chain Monte Carlo (Metropolis-Hastings, Gibbs sampling, Hamiltonian Monte Carlo) draws samples from intractable posteriors. The modern workhorses are Stan, PyMC, NumPyro — all use HMC variants. For a quant: MCMC is overkill for routine problems but essential when you have a structural model with non-standard priors (e.g., hierarchical credit models).

Posterior predictive

math

p(x_new | x) = ∫ p(x_new | θ) p(θ | x) dθ

The right way to predict future data: integrate over the posterior, not just plug in θ̂. The predictive distribution is wider than the likelihood evaluated at θ̂, correctly reflecting parameter uncertainty. Bayesian VaR is wider than MLE-plug-in VaR for the same reason.

Black-Litterman in one slide

BL is Bayesian portfolio construction: start with an equilibrium prior π (CAPM-implied returns), specify subjective views Q with confidence Ω, and combine via Bayes' rule. The posterior mean Π* is the precision-weighted blend; the posterior covariance updates Σ. Plug into mean-variance optimisation and you get the BL portfolio. Module 8 of Portfolio Theory walks through every step.

Exercise

Your prior on a stock's annual expected return is μ ~ N(8%, 5%²). You observe 3 years of returns averaging 12% per year, with known annual standard deviation σ = 20%. (1) Compute the posterior mean and standard deviation. (2) Compare to the prior and to the pure-MLE estimate. (3) Comment.