SVD — the master decomposition — Linear Algebra Module 9

The singular value decomposition is the master factorisation of linear algebra. Every matrix — square or rectangular, real or complex, full-rank or rank-deficient — has an SVD. It generalises eigendecomposition, it underlies PCA, it gives the cleanest characterisation of rank, and it produces the optimal low-rank approximation of any matrix.

Statement

math

A = U Σ Vᵀ
A: m×n
U: m×m orthogonal (left singular vectors)
Σ: m×n diagonal with non-negative entries σ₁ ≥ σ₂ ≥ ... ≥ 0 (singular values)
V: n×n orthogonal (right singular vectors)

The σᵢ are non-negative; the number of non-zero σᵢ equals the rank of A. The singular vectors are orthonormal. SVD always exists, for any matrix.

Connection to eigendecomposition

The singular values of A are the square roots of the non-zero eigenvalues of AᵀA (equivalently AAᵀ). The right singular vectors are eigenvectors of AᵀA; the left singular vectors are eigenvectors of AAᵀ. SVD is therefore the symmetrised, generalised eigendecomposition.

Geometric picture

Any linear map decomposes into three stages: rotate (Vᵀ), then stretch (Σ), then rotate (U). The unit ball in Rⁿ maps to an ellipsoid in Rᵐ whose semi-axes are along the columns of U and have lengths σᵢ. This is the cleanest geometric view of a linear map there is.

Low-rank approximation: the Eckart-Young theorem

The best rank-k approximation of A in either the Frobenius or operator (spectral) norm is obtained by zeroing out the smallest singular values:

math

A ≈ A_k = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ

Eckart-Young — why SVD rules data analysis

No other rank-k matrix gets closer to A. Period. This is the theorem behind PCA, image compression, latent semantic indexing, recommender-system matrix factorisation, and yield-curve dimensionality reduction. The k largest σᵢ are exactly the right things to keep.

Pseudo-inverse

For A = UΣVᵀ, the Moore-Penrose pseudo-inverse is A⁺ = VΣ⁺Uᵀ, where Σ⁺ inverts the non-zero singular values (1/σᵢ) and leaves zeros as zeros. A⁺b gives the minimum-norm least-squares solution to Ax = b, even when A is rank-deficient. This is how numerical libraries (numpy.linalg.lstsq) silently produce a sensible answer when XᵀX is singular.

SVD in production

Computing rank robustly: count singular values above a tolerance (e.g., σ_max · max(m,n) · machine_eps).
PCA: SVD of the mean-centred data matrix is more numerically stable than eigendecomposition of the covariance.
Solving ill-conditioned least squares: drop the smallest singular values before inverting — this is truncated SVD regularisation.
Image and data compression: keep only the top-k singular values for a k:n compression ratio.

SVD is O(min(m,n)·max(m,n)²)

Full SVD is expensive. For very large matrices (millions of rows), use randomised SVD (Halko-Martinsson-Tropp 2011) or partial SVD that only computes the top-k singular triplets. Most production data-science stacks default to randomised SVD when k ≪ min(m, n).

Exercise

You have a 252×500 matrix of daily returns R for a 500-stock universe. (1) What does the SVD R = UΣVᵀ give you in plain terms? (2) How does it relate to PCA on returns? (3) What is the maximum possible rank?