Bayesian inference treats parameters as random variables and combines prior beliefs with observed data via Bayes' rule. For finance, the Bayesian framework formalises something every PM does intuitively: starting with views, updating them as new evidence arrives. Black-Litterman portfolio construction is Bayesian; modern ML risk models are Bayesian; the credibility-weighted credit spreads used in actuarial work are Bayesian.
Bayes' rule for parameters
p(θ | x) = p(x | θ) p(θ) / p(x)∝ likelihood × prior
- p(θ): prior — what you believed before seeing data.
- p(x | θ): likelihood — same object MLE maximises.
- p(θ | x): posterior — updated belief after data.
- p(x): marginal likelihood / evidence — a normalising constant for parameter inference, but central for model comparison.
Conjugate priors — the magic shortcut
For certain likelihood-prior pairings the posterior has the same functional form as the prior, with updated parameters. Closed-form, no MCMC required.
- Normal mean (variance known) + Normal prior → Normal posterior. The classic Bayesian-mean update.
- Normal variance + inverse-gamma prior → inverse-gamma posterior.
- Bernoulli/binomial + Beta prior → Beta posterior. The default-rate updating used in credit.
- Poisson + Gamma prior → Gamma posterior. The actuarial credibility model.
Normal-normal update
Observe X₁, ..., Xₙ ~ N(μ, σ²) with σ² known. Prior μ ~ N(μ₀, τ²). Posterior:
μ | x ~ N(μ_post, τ²_post)μ_post = (μ₀/τ² + n X̄/σ²) / (1/τ² + n/σ²)1/τ²_post = 1/τ² + n/σ²
The posterior mean is a precision-weighted average of the prior mean and the sample mean. Precisions add. As n grows the prior is washed out; as τ² → ∞ (flat prior) the posterior mean converges to the MLE.
Shrinkage as a Bayesian operation
Every shrinkage estimator (James-Stein, Ledoit-Wolf, ridge regression) is approximately a Bayesian posterior mean for a particular prior. Bayesian thinking provides the theoretical justification for shrinkage rules that are otherwise ad hoc.
MCMC — when no conjugate prior helps
Markov Chain Monte Carlo (Metropolis-Hastings, Gibbs sampling, Hamiltonian Monte Carlo) draws samples from intractable posteriors. The modern workhorses are Stan, PyMC, NumPyro — all use HMC variants. For a quant: MCMC is overkill for routine problems but essential when you have a structural model with non-standard priors (e.g., hierarchical credit models).
Posterior predictive
p(x_new | x) = ∫ p(x_new | θ) p(θ | x) dθ
The right way to predict future data: integrate over the posterior, not just plug in θ̂. The predictive distribution is wider than the likelihood evaluated at θ̂, correctly reflecting parameter uncertainty. Bayesian VaR is wider than MLE-plug-in VaR for the same reason.
Black-Litterman in one slide
BL is Bayesian portfolio construction: start with an equilibrium prior π (CAPM-implied returns), specify subjective views Q with confidence Ω, and combine via Bayes' rule. The posterior mean Π* is the precision-weighted blend; the posterior covariance updates Σ. Plug into mean-variance optimisation and you get the BL portfolio. Module 8 of Portfolio Theory walks through every step.
Exercise
Your prior on a stock's annual expected return is μ ~ N(8%, 5%²). You observe 3 years of returns averaging 12% per year, with known annual standard deviation σ = 20%. (1) Compute the posterior mean and standard deviation. (2) Compare to the prior and to the pure-MLE estimate. (3) Comment.