Numerical linear algebra in practice — Linear Algebra Module 12

Mathematical linear algebra and computational linear algebra are two different subjects. The matrix that is invertible on paper may be effectively singular in floating-point. The factorisation that is fastest on a small example may scale catastrophically. This module bridges the gap — the practical knowledge a quant developer accumulates over years of debugging covariance matrices at 4 a.m.

Condition number — the single most important diagnostic

For an invertible matrix A, the condition number is κ(A) = ‖A‖ · ‖A⁻¹‖, typically computed as σ_max / σ_min (ratio of largest to smallest singular value). It bounds the worst-case sensitivity of solutions x = A⁻¹b to perturbations in b: a relative error δb in b can produce a relative error up to κ(A) · δb in x.

κ ≤ 100: well-conditioned, no issues
κ in [100, 10⁴]: mild ill-conditioning; double precision still safe
κ in [10⁴, 10¹²]: serious; results have lost half their digits
κ > 10¹²: hopeless in double precision; solutions are pure noise

Sample covariances are routinely ill-conditioned

For 500 stocks and 252 days, the sample covariance has 248 zero eigenvalues (rank-deficient) and the non-zero ones span typically 6-8 orders of magnitude. Inverting it is a fantasy. Always shrink, factor-structure, or regularise before any operation that requires Σ⁻¹.

Choosing the right factorisation

Cholesky (A = LLᵀ) — for symmetric positive-definite A. Fastest and most stable choice. Use for covariance simulation and for solving normal equations.
LU with partial pivoting — for general square A. The default in numpy.linalg.solve. O(n³) work.
QR (A = QR) — for least squares on overdetermined systems. Numerically stable; preferred over normal equations.
SVD (A = UΣVᵀ) — for rank-deficient or near-singular A. The most expensive but most robust. Used by numpy.linalg.lstsq and pseudo-inverse.

Forming AᵀA or A⁻¹ unnecessarily

Two of the most common quant-developer mistakes:

Forming AᵀA and then solving AᵀA x = Aᵀb. Squares the condition number. Use QR or SVD on A directly instead.
Computing A⁻¹ then multiplying A⁻¹b. Slower, less stable, and accumulates more error than scipy.linalg.solve(A, b).

Nearest PSD projection

When a covariance comes out non-PSD (a handful of negative eigenvalues from numerical drift or from blending estimators), the standard fix is the Higham (1988) nearest-PSD projection: eigendecompose, clip negative eigenvalues to zero, reconstruct. A cleaner alternative for correlation matrices is the Higham (2002) alternating-projections algorithm.

python

import numpy as np

def nearest_psd(A, eps=0.0):
    A_sym = (A + A.T) / 2
    w, V = np.linalg.eigh(A_sym)
    w = np.maximum(w, eps)
    return V @ np.diag(w) @ V.T

Shrinkage

Ledoit-Wolf (2003) shrinkage is the standard production fix for ill-conditioned sample covariances. Form Σ_shrink = δ · Target + (1 - δ) · Σ̂, where Target is a low-parameter structured matrix (identity, constant-correlation, single-factor) and δ ∈ [0,1] is the shrinkage intensity, often chosen by an analytic formula minimising expected Frobenius distance to the true Σ.

Empirical default

If you have nothing better to do, set Target = (tr(Σ̂)/n)·I and δ ≈ 0.1-0.3. This single line fixes 80% of the production pathologies you'll see with raw sample covariance.

Sparsity and structure

For very large problems (10⁵+ assets, factor returns, country-pair matrices), exploit structure: banded matrices use band solvers, block-diagonal matrices solve in blocks, sparse matrices use specialised storage and iterative solvers (conjugate gradient, GMRES). Knowing when scipy.sparse is appropriate distinguishes a quant who scales from one who doesn't.

Exercise

A junior quant computes a 200-asset portfolio's marginal risk by forming Σ⁻¹ explicitly, multiplying through, and then computing risk attributions. The function takes 8 seconds. (1) What three numerical issues might be present? (2) Rewrite the computation properly. (3) How fast should it run on a modern laptop?