Mathematical linear algebra and computational linear algebra are two different subjects. The matrix that is invertible on paper may be effectively singular in floating-point. The factorisation that is fastest on a small example may scale catastrophically. This module bridges the gap — the practical knowledge a quant developer accumulates over years of debugging covariance matrices at 4 a.m.
Condition number — the single most important diagnostic
For an invertible matrix A, the condition number is κ(A) = ‖A‖ · ‖A⁻¹‖, typically computed as σ_max / σ_min (ratio of largest to smallest singular value). It bounds the worst-case sensitivity of solutions x = A⁻¹b to perturbations in b: a relative error δb in b can produce a relative error up to κ(A) · δb in x.
- κ ≤ 100: well-conditioned, no issues
- κ in [100, 10⁴]: mild ill-conditioning; double precision still safe
- κ in [10⁴, 10¹²]: serious; results have lost half their digits
- κ > 10¹²: hopeless in double precision; solutions are pure noise
Sample covariances are routinely ill-conditioned
For 500 stocks and 252 days, the sample covariance has 248 zero eigenvalues (rank-deficient) and the non-zero ones span typically 6-8 orders of magnitude. Inverting it is a fantasy. Always shrink, factor-structure, or regularise before any operation that requires Σ⁻¹.
Choosing the right factorisation
- Cholesky (A = LLᵀ) — for symmetric positive-definite A. Fastest and most stable choice. Use for covariance simulation and for solving normal equations.
- LU with partial pivoting — for general square A. The default in numpy.linalg.solve. O(n³) work.
- QR (A = QR) — for least squares on overdetermined systems. Numerically stable; preferred over normal equations.
- SVD (A = UΣVᵀ) — for rank-deficient or near-singular A. The most expensive but most robust. Used by numpy.linalg.lstsq and pseudo-inverse.
Forming AᵀA or A⁻¹ unnecessarily
Two of the most common quant-developer mistakes:
- Forming AᵀA and then solving AᵀA x = Aᵀb. Squares the condition number. Use QR or SVD on A directly instead.
- Computing A⁻¹ then multiplying A⁻¹b. Slower, less stable, and accumulates more error than scipy.linalg.solve(A, b).
Nearest PSD projection
When a covariance comes out non-PSD (a handful of negative eigenvalues from numerical drift or from blending estimators), the standard fix is the Higham (1988) nearest-PSD projection: eigendecompose, clip negative eigenvalues to zero, reconstruct. A cleaner alternative for correlation matrices is the Higham (2002) alternating-projections algorithm.
import numpy as npdef nearest_psd(A, eps=0.0):A_sym = (A + A.T) / 2w, V = np.linalg.eigh(A_sym)w = np.maximum(w, eps)return V @ np.diag(w) @ V.T
Shrinkage
Ledoit-Wolf (2003) shrinkage is the standard production fix for ill-conditioned sample covariances. Form Σ_shrink = δ · Target + (1 - δ) · Σ̂, where Target is a low-parameter structured matrix (identity, constant-correlation, single-factor) and δ ∈ [0,1] is the shrinkage intensity, often chosen by an analytic formula minimising expected Frobenius distance to the true Σ.
Empirical default
If you have nothing better to do, set Target = (tr(Σ̂)/n)·I and δ ≈ 0.1-0.3. This single line fixes 80% of the production pathologies you'll see with raw sample covariance.
Sparsity and structure
For very large problems (10⁵+ assets, factor returns, country-pair matrices), exploit structure: banded matrices use band solvers, block-diagonal matrices solve in blocks, sparse matrices use specialised storage and iterative solvers (conjugate gradient, GMRES). Knowing when scipy.sparse is appropriate distinguishes a quant who scales from one who doesn't.
Exercise
A junior quant computes a 200-asset portfolio's marginal risk by forming Σ⁻¹ explicitly, multiplying through, and then computing risk attributions. The function takes 8 seconds. (1) What three numerical issues might be present? (2) Rewrite the computation properly. (3) How fast should it run on a modern laptop?