Once you decide to fit an ARMA model, choosing the right (p, q) is a statistics-and-craft problem. Information criteria, the ACF/PACF inspection, residual diagnostics, and out-of-sample performance all play a role. The temptation to over-fit is enormous and the cost of over-fitting in forecasting is real.
Box-Jenkins methodology
- Identification: examine ACF and PACF, decide tentative p and q.
- Estimation: fit by conditional or exact maximum likelihood.
- Diagnostic checking: examine residuals for whiteness and absence of structure.
- Forecasting: produce point and interval forecasts, evaluate out-of-sample.
ACF and PACF cheat sheet
- AR(p): PACF cuts off at lag p; ACF decays gradually.
- MA(q): ACF cuts off at lag q; PACF decays gradually.
- ARMA(p, q): both decay gradually; identification is harder.
- Non-stationary: ACF decays very slowly; first-difference and re-inspect.
Information criteria
AIC = -2 ln L + 2k (k = number of parameters)BIC = -2 ln L + k ln T (heavier penalty for k)HQIC = -2 ln L + 2k ln ln T (between AIC and BIC)
Pick the model with the smallest IC value. AIC selects more liberally (asymptotically picks a model at least as large as the truth). BIC selects more parsimoniously (consistent for finding the true model when it's nested in the candidate set). Practitioner default: report both, prefer BIC for forecasting.
Don't grid-search blindly
Searching over (p, q) ∈ {0,...,5}² gives 36 candidates. Selecting the minimum-AIC across them is a textbook recipe for over-fitting. Use parsimony: start with (1,0), (0,1), (1,1) and only expand if diagnostics demand. The 'best' model in-sample is rarely the best out-of-sample.
Residual diagnostics
- Ljung-Box on residuals: should be no autocorrelation (p-value > 0.05).
- Ljung-Box on squared residuals: should detect remaining conditional heteroskedasticity (motivates GARCH).
- QQ-plot vs normal: check distributional fit; expect deviations in the tails for financial residuals.
- Residual time plot: look for trends, outliers, structural breaks the model missed.
Cross-validation for time series
Standard k-fold CV mixes future and past data and is invalid for time series. Use rolling-origin (walk-forward) cross-validation: fit on [1, t], predict t+1, expand window, repeat. Compute forecast errors out-of-sample and aggregate. This is the closest analogue of holding out a future test set.
Structural breaks
Chow test (known break date) and supremum-Wald / sup-LM (unknown break date) detect parameter changes. Financial series are riddled with breaks: regime shifts in monetary policy, currency-board adoptions, post-2008 zero-rate regime. Modelling these explicitly is critical; ignoring them yields parameter estimates that are weighted averages over incompatible regimes.
Exercise
You fit ARMA(1,1), ARMA(2,1), and ARMA(1,2) to log returns of a 5-year daily series (1260 obs). The log-likelihoods are -2150, -2148.5, -2149. (1) Compute AIC and BIC for each. (2) Which model would AIC pick? Which would BIC pick? (3) Discuss the discrepancy.