Every matrix A defines four fundamental subspaces. Understanding them is the single most useful piece of linear-algebraic geometry for an applied analyst — it tells you when systems are solvable, what regression really computes, and why some covariance matrices simply cannot be inverted.
The four fundamental subspaces
For an m×n matrix A:
- Column space C(A) ⊆ Rᵐ: span of the columns. The set of all b such that Ax = b has a solution.
- Null space N(A) ⊆ Rⁿ: vectors x such that Ax = 0. The 'redundancy' in A.
- Row space C(Aᵀ) ⊆ Rⁿ: span of the rows. Orthogonal complement of N(A) inside Rⁿ.
- Left null space N(Aᵀ) ⊆ Rᵐ: vectors y such that Aᵀy = 0. Orthogonal complement of C(A) inside Rᵐ.
Rank and the rank-nullity theorem
The rank of A is the dimension of its column space, which equals the dimension of its row space. The rank-nullity theorem says:
rank(A) + dim N(A) = n (n = number of columns)
Why this matters for covariance matrices
If you estimate a sample covariance Σ̂ = (1/T) RᶜᵀRᶜ for n assets from T < n days, then rank(Σ̂) ≤ T < n. The matrix is singular: there exist non-zero portfolios w with wᵀΣ̂w = 0 — apparent free lunches that are pure sampling artefacts. Every modern portfolio construction technique (shrinkage, factor covariance, regularisation) is fundamentally a response to this rank-deficiency.
Orthogonal projections
Given a subspace V ⊆ Rᵐ, the orthogonal projection P_V is the linear map that sends every vector y to its closest point in V. Algebraically:
P_V = A (AᵀA)⁻¹ Aᵀ when V = C(A) and columns of A are linearly independent
Properties of any orthogonal projection P: (a) P² = P (projecting twice does nothing more than projecting once); (b) P = Pᵀ (symmetric); (c) eigenvalues are 0 or 1. The complement projection is I - P.
The projection theorem
For any vector y ∈ Rᵐ and any subspace V, there is a unique decomposition y = P_V(y) + (I - P_V)(y) where the first piece lies in V and the second piece is orthogonal to V. The first piece minimises ‖y - v‖ over all v ∈ V.
OLS is a projection
In the regression y = Xβ + u, the OLS fitted values ŷ = X(XᵀX)⁻¹Xᵀy = Py are the orthogonal projection of y onto the column space of X. The residuals û = y - ŷ are orthogonal to every regressor. The Frisch-Waugh-Lovell theorem, the geometry of partial-out regressions, and the algebra of the influence function all follow from this one observation.
Multicollinearity in vector language
When regressors are highly correlated, the columns of X are nearly linearly dependent. XᵀX becomes nearly singular — its smallest eigenvalue is near zero — and (XᵀX)⁻¹ has huge entries. Small data perturbations produce large coefficient changes. The condition number κ(XᵀX) is the diagnostic; values above 1000 are red flags.
Exercise
Suppose X is a T×k matrix of regressors. (1) Under what algebraic condition does the OLS coefficient β̂ = (XᵀX)⁻¹Xᵀy exist and is unique? (2) If T < k, can β̂ exist? (3) If two columns of X are exactly proportional, what happens?