Regression as a statistical estimator — Stats for Finance Module 9

Regression is the statistical workhorse — fitting linear relationships between a target and predictors, then evaluating uncertainty in the fit. We covered OLS as projection (Linear Algebra Module 5-6); here we treat it statistically: the sampling distribution of β̂, the t- and F-machinery, and the R² caveats.

Setup

math

y = Xβ + u,    u ~ N(0, σ² I)
β̂ = (XᵀX)⁻¹ Xᵀy

Sampling distribution of β̂

Under the OLS assumptions, β̂ is a linear function of y, hence normally distributed:

math

β̂ ~ N(β, σ² (XᵀX)⁻¹)

Standard errors of individual coefficients are square roots of the diagonal of σ²(XᵀX)⁻¹. We don't know σ², so we estimate it by s² = û'û / (n - k).

t-statistic and F-statistic

math

t_j = β̂_j / SE(β̂_j) ~ t_{n-k}    (under H₀: β_j = 0)
F = ((SSR_R - SSR_U) / q) / (SSR_U / (n - k)) ~ F_{q, n-k}    (joint H₀)

The t tests one coefficient at a time; the F tests joint restrictions (e.g., 'all slopes equal zero'). In large samples both distributions converge to z and χ²/q respectively.

R² and adjusted R²

math

R² = 1 - SSR / SST
Adj R² = 1 - (1 - R²)(n - 1)/(n - k - 1)

R² inflates with predictors

Adding any regressor — including pure noise — never decreases R². Adjusted R² penalises this. But neither is a model-quality measure: a perfect identity X = X gives R² = 1; a regression of return on lagged return at daily frequency typically gives R² ≈ 0.01, which is excellent for finance. Domain-relative R²s, not absolute thresholds.

Robust standard errors

Under heteroskedasticity (Var(uᵢ | X) varies), the classical SE formula is wrong. White's heteroskedasticity-consistent (HC0) estimator and its small-sample variants (HC1, HC2, HC3) replace σ²(XᵀX)⁻¹ with the sandwich (XᵀX)⁻¹ Xᵀ diag(û²) X (XᵀX)⁻¹. Use HC1 by default; HC3 if you have small samples or high-leverage observations.

Clustered standard errors

When observations are grouped (firm-year panels, household survey blocks), residuals within clusters are correlated. Cluster-robust SEs (Liang-Zeger 1986, Cameron-Miller 2015) replace diag(û²) with the within-cluster outer-product. Vital for panel data; failing to cluster typically halves the standard errors.

OLS as MLE

Under the Gaussian-errors assumption, OLS β̂ is also the MLE. The connection: maximising the Gaussian log-likelihood is equivalent to minimising the sum of squared residuals. This is why OLS attains the Cramér-Rao bound and is asymptotically efficient under correct specification.

Misspecification

Omitted variable: β̂ for included regressors is biased.
Wrong functional form: nonlinear y(X) but linear fit produces residuals that look like a pattern.
Heteroskedasticity: SEs wrong, β̂ still unbiased.
Autocorrelation: SEs wrong (Newey-West fix), β̂ still unbiased in cross-section.
Endogeneity: β̂ inconsistent. The big one — Econometrics Module 6.

Exercise

You regress monthly stock returns on the market return (24 months). β̂ = 1.3, SE(β̂) = 0.2. (1) Is β̂ statistically different from 1 (the market beta)? (2) Compute a 95% CI for β. (3) The R² is 0.45. Interpret.