Skip to content
Module 12 of 1255 min readIntermediate

Numerical linear algebra in practice

Conditioning, stability, Cholesky for portfolio simulation, when to use NumPy vs LAPACK directly, the traps that wreck production code.

100%

Listen along

Read “Numerical linear algebra in practice” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Mathematical linear algebra and computational linear algebra are two different subjects. The matrix that is invertible on paper may be effectively singular in floating-point. The factorisation that is fastest on a small example may scale catastrophically. This module bridges the gap — the practical knowledge a quant developer accumulates over years of debugging covariance matrices at 4 a.m.

Condition number — the single most important diagnostic

For an invertible matrix A, the condition number is κ(A) = ‖A‖ · ‖A⁻¹‖, typically computed as σ_max / σ_min (ratio of largest to smallest singular value). It bounds the worst-case sensitivity of solutions x = A⁻¹b to perturbations in b: a relative error δb in b can produce a relative error up to κ(A) · δb in x.

  • κ ≤ 100: well-conditioned, no issues
  • κ in [100, 10⁴]: mild ill-conditioning; double precision still safe
  • κ in [10⁴, 10¹²]: serious; results have lost half their digits
  • κ > 10¹²: hopeless in double precision; solutions are pure noise

Sample covariances are routinely ill-conditioned

For 500 stocks and 252 days, the sample covariance has 248 zero eigenvalues (rank-deficient) and the non-zero ones span typically 6-8 orders of magnitude. Inverting it is a fantasy. Always shrink, factor-structure, or regularise before any operation that requires Σ⁻¹.

Choosing the right factorisation

  • Cholesky (A = LLᵀ) — for symmetric positive-definite A. Fastest and most stable choice. Use for covariance simulation and for solving normal equations.
  • LU with partial pivoting — for general square A. The default in numpy.linalg.solve. O(n³) work.
  • QR (A = QR) — for least squares on overdetermined systems. Numerically stable; preferred over normal equations.
  • SVD (A = UΣVᵀ) — for rank-deficient or near-singular A. The most expensive but most robust. Used by numpy.linalg.lstsq and pseudo-inverse.

Forming AᵀA or A⁻¹ unnecessarily

Two of the most common quant-developer mistakes:

  1. Forming AᵀA and then solving AᵀA x = Aᵀb. Squares the condition number. Use QR or SVD on A directly instead.
  2. Computing A⁻¹ then multiplying A⁻¹b. Slower, less stable, and accumulates more error than scipy.linalg.solve(A, b).

Nearest PSD projection

When a covariance comes out non-PSD (a handful of negative eigenvalues from numerical drift or from blending estimators), the standard fix is the Higham (1988) nearest-PSD projection: eigendecompose, clip negative eigenvalues to zero, reconstruct. A cleaner alternative for correlation matrices is the Higham (2002) alternating-projections algorithm.

python
import numpy as np
def nearest_psd(A, eps=0.0):
A_sym = (A + A.T) / 2
w, V = np.linalg.eigh(A_sym)
w = np.maximum(w, eps)
return V @ np.diag(w) @ V.T

Shrinkage

Ledoit-Wolf (2003) shrinkage is the standard production fix for ill-conditioned sample covariances. Form Σ_shrink = δ · Target + (1 - δ) · Σ̂, where Target is a low-parameter structured matrix (identity, constant-correlation, single-factor) and δ ∈ [0,1] is the shrinkage intensity, often chosen by an analytic formula minimising expected Frobenius distance to the true Σ.

Empirical default

If you have nothing better to do, set Target = (tr(Σ̂)/n)·I and δ ≈ 0.1-0.3. This single line fixes 80% of the production pathologies you'll see with raw sample covariance.

Sparsity and structure

For very large problems (10⁵+ assets, factor returns, country-pair matrices), exploit structure: banded matrices use band solvers, block-diagonal matrices solve in blocks, sparse matrices use specialised storage and iterative solvers (conjugate gradient, GMRES). Knowing when scipy.sparse is appropriate distinguishes a quant who scales from one who doesn't.

Exercise

A junior quant computes a 200-asset portfolio's marginal risk by forming Σ⁻¹ explicitly, multiplying through, and then computing risk attributions. The function takes 8 seconds. (1) What three numerical issues might be present? (2) Rewrite the computation properly. (3) How fast should it run on a modern laptop?

Loading progress…
LeadAfrikPublic Economics Hub