Skip to content
Module 05 of 1155 min readAdvanced

Forecasting and the Wold decomposition

Point forecasts, forecast errors, prediction intervals. Wold's theorem and why every stationary series is essentially an MA(∞).

45%

Listen along

Read “Forecasting and the Wold decomposition” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Forecasting is the deliverable of most time-series modelling. The forecast itself, the uncertainty around it, and the diagnostics that justify both are the three things every forecasting framework must produce. Two facts dominate: forecast errors grow with horizon, and the unconditional mean dominates long-horizon point forecasts.

Optimal point forecast

Under squared-error loss, the optimal h-step-ahead forecast is the conditional expectation: X̂_{t+h|t} = E[X_{t+h} | F_t]. Under absolute-error loss, it's the conditional median. Under more exotic loss functions (asymmetric, quantile), other functionals.

AR(1) forecasts

For X_t = c + φ X_{t-1} + ε_t with stationary AR(1): X̂_{t+h|t} = c(1 + φ + ... + φ^(h-1)) + φ^h X_t. As h → ∞, X̂_{t+h|t} → c / (1 - φ) — the unconditional mean. Forecasts mean-revert geometrically.

Forecast errors

The h-step forecast error is e_{t+h|t} = X_{t+h} - X̂_{t+h|t}. For ARMA(p, q) with Wold MA(∞) coefficients ψ_j, the h-step forecast error variance is:

math
Var(e_{t+h|t}) = σ²(ψ_0² + ψ_1² + ... + ψ_{h-1}²)

Variance grows with h, plateauing at the unconditional variance as h → ∞ for stationary processes. For random walks, variance grows linearly — never converging.

Prediction intervals

Assuming Gaussian errors: PI_α = X̂_{t+h|t} ± z_α · √Var(e_{t+h|t}). For non-Gaussian errors, bootstrap or simulation-based intervals are more honest. Empirical financial data is heavy-tailed; Gaussian intervals are often too narrow.

Forecast evaluation

  • MAE — mean absolute error.
  • RMSE — root mean squared error.
  • MAPE — mean absolute percentage error; problematic when actual is near zero.
  • Directional accuracy — fraction of correct sign predictions; relevant for trading.
  • Diebold-Mariano test — formal comparison of two forecasts' loss differentials.

Wold decomposition revisited

Every stationary, purely-nondeterministic process is an MA(∞) of its own innovations. The Wold theorem guarantees the existence; it doesn't guarantee identifiability of the MA coefficients from finite data. The practical content: ARMA models are flexible enough to approximate any stationary linear process, but with finite samples we can only get the first few ψ_j precisely.

Combining forecasts

Equal-weighted combinations of multiple forecasts often beat individual forecasts, even sophisticated ones. The Bates-Granger (1969) finding has held up across decades and domains. Bayesian model averaging is the principled generalisation; simple averaging is the practitioner's default.

Forecasting returns is hard for fundamental reasons

If you could forecast next month's stock return to, say, RMSE 1.5% (vs. unconditional vol of ~5%), you'd be running the world's most profitable hedge fund. The fact that the best forecast models barely beat 'the unconditional mean is zero' on most return series is not a failure of the methodology — it's the absence of forecastable signal in efficient prices.

Exercise

Fitting an AR(1) to monthly Kenya inflation gives c = 0.5%, φ = 0.7, σ_ε = 0.4%. Current inflation X_t = 6.0%. (1) Forecast inflation 1, 3, 6, 12 months ahead. (2) Compute the 95% prediction interval at h = 12. (3) Compare to the unconditional mean.

Loading progress…
LeadAfrikPublic Economics Hub