The singular value decomposition is the master factorisation of linear algebra. Every matrix — square or rectangular, real or complex, full-rank or rank-deficient — has an SVD. It generalises eigendecomposition, it underlies PCA, it gives the cleanest characterisation of rank, and it produces the optimal low-rank approximation of any matrix.
Statement
A = U Σ VᵀA: m×nU: m×m orthogonal (left singular vectors)Σ: m×n diagonal with non-negative entries σ₁ ≥ σ₂ ≥ ... ≥ 0 (singular values)V: n×n orthogonal (right singular vectors)
The σᵢ are non-negative; the number of non-zero σᵢ equals the rank of A. The singular vectors are orthonormal. SVD always exists, for any matrix.
Connection to eigendecomposition
The singular values of A are the square roots of the non-zero eigenvalues of AᵀA (equivalently AAᵀ). The right singular vectors are eigenvectors of AᵀA; the left singular vectors are eigenvectors of AAᵀ. SVD is therefore the symmetrised, generalised eigendecomposition.
Geometric picture
Any linear map decomposes into three stages: rotate (Vᵀ), then stretch (Σ), then rotate (U). The unit ball in Rⁿ maps to an ellipsoid in Rᵐ whose semi-axes are along the columns of U and have lengths σᵢ. This is the cleanest geometric view of a linear map there is.
Low-rank approximation: the Eckart-Young theorem
The best rank-k approximation of A in either the Frobenius or operator (spectral) norm is obtained by zeroing out the smallest singular values:
A ≈ A_k = Σᵢ₌₁ᵏ σᵢ uᵢ vᵢᵀ
Eckart-Young — why SVD rules data analysis
No other rank-k matrix gets closer to A. Period. This is the theorem behind PCA, image compression, latent semantic indexing, recommender-system matrix factorisation, and yield-curve dimensionality reduction. The k largest σᵢ are exactly the right things to keep.
Pseudo-inverse
For A = UΣVᵀ, the Moore-Penrose pseudo-inverse is A⁺ = VΣ⁺Uᵀ, where Σ⁺ inverts the non-zero singular values (1/σᵢ) and leaves zeros as zeros. A⁺b gives the minimum-norm least-squares solution to Ax = b, even when A is rank-deficient. This is how numerical libraries (numpy.linalg.lstsq) silently produce a sensible answer when XᵀX is singular.
SVD in production
- Computing rank robustly: count singular values above a tolerance (e.g., σ_max · max(m,n) · machine_eps).
- PCA: SVD of the mean-centred data matrix is more numerically stable than eigendecomposition of the covariance.
- Solving ill-conditioned least squares: drop the smallest singular values before inverting — this is truncated SVD regularisation.
- Image and data compression: keep only the top-k singular values for a k:n compression ratio.
SVD is O(min(m,n)·max(m,n)²)
Full SVD is expensive. For very large matrices (millions of rows), use randomised SVD (Halko-Martinsson-Tropp 2011) or partial SVD that only computes the top-k singular triplets. Most production data-science stacks default to randomised SVD when k ≪ min(m, n).
Exercise
You have a 252×500 matrix of daily returns R for a 500-stock universe. (1) What does the SVD R = UΣVᵀ give you in plain terms? (2) How does it relate to PCA on returns? (3) What is the maximum possible rank?