Rank, the four fundamental subspaces, and projections — Linear Algebra Module 5

Every matrix A defines four fundamental subspaces. Understanding them is the single most useful piece of linear-algebraic geometry for an applied analyst — it tells you when systems are solvable, what regression really computes, and why some covariance matrices simply cannot be inverted.

The four fundamental subspaces

For an m×n matrix A:

Column space C(A) ⊆ Rᵐ: span of the columns. The set of all b such that Ax = b has a solution.
Null space N(A) ⊆ Rⁿ: vectors x such that Ax = 0. The 'redundancy' in A.
Row space C(Aᵀ) ⊆ Rⁿ: span of the rows. Orthogonal complement of N(A) inside Rⁿ.
Left null space N(Aᵀ) ⊆ Rᵐ: vectors y such that Aᵀy = 0. Orthogonal complement of C(A) inside Rᵐ.

Rank and the rank-nullity theorem

The rank of A is the dimension of its column space, which equals the dimension of its row space. The rank-nullity theorem says:

math

rank(A) + dim N(A) = n      (n = number of columns)

Why this matters for covariance matrices

If you estimate a sample covariance Σ̂ = (1/T) RᶜᵀRᶜ for n assets from T < n days, then rank(Σ̂) ≤ T < n. The matrix is singular: there exist non-zero portfolios w with wᵀΣ̂w = 0 — apparent free lunches that are pure sampling artefacts. Every modern portfolio construction technique (shrinkage, factor covariance, regularisation) is fundamentally a response to this rank-deficiency.

Orthogonal projections

Given a subspace V ⊆ Rᵐ, the orthogonal projection P_V is the linear map that sends every vector y to its closest point in V. Algebraically:

math

P_V = A (AᵀA)⁻¹ Aᵀ   when V = C(A) and columns of A are linearly independent

Properties of any orthogonal projection P: (a) P² = P (projecting twice does nothing more than projecting once); (b) P = Pᵀ (symmetric); (c) eigenvalues are 0 or 1. The complement projection is I - P.

The projection theorem

For any vector y ∈ Rᵐ and any subspace V, there is a unique decomposition y = P_V(y) + (I - P_V)(y) where the first piece lies in V and the second piece is orthogonal to V. The first piece minimises ‖y - v‖ over all v ∈ V.

OLS is a projection

In the regression y = Xβ + u, the OLS fitted values ŷ = X(XᵀX)⁻¹Xᵀy = Py are the orthogonal projection of y onto the column space of X. The residuals û = y - ŷ are orthogonal to every regressor. The Frisch-Waugh-Lovell theorem, the geometry of partial-out regressions, and the algebra of the influence function all follow from this one observation.

Multicollinearity in vector language

When regressors are highly correlated, the columns of X are nearly linearly dependent. XᵀX becomes nearly singular — its smallest eigenvalue is near zero — and (XᵀX)⁻¹ has huge entries. Small data perturbations produce large coefficient changes. The condition number κ(XᵀX) is the diagnostic; values above 1000 are red flags.

Exercise

Suppose X is a T×k matrix of regressors. (1) Under what algebraic condition does the OLS coefficient β̂ = (XᵀX)⁻¹Xᵀy exist and is unique? (2) If T < k, can β̂ exist? (3) If two columns of X are exactly proportional, what happens?