A matrix is an m×n grid of numbers, but the deeper definition is that it represents a linear map from Rⁿ to Rᵐ. Every theorem in this course flows more naturally from the linear-map view than from the grid view.
Matrix multiplication is function composition
If A is m×n and B is n×p, then AB is m×p. The product represents 'apply B, then apply A'. The (i, j) entry of AB is the inner product of row i of A with column j of B.
(AB)ᵢⱼ = Σₖ Aᵢₖ Bₖⱼ
Matrix multiplication is not commutative
AB ≠ BA in general. AB might not even be defined when BA is. This is not a bug; it reflects the fact that function composition isn't commutative either — 'apply rotation then projection' is not the same as 'apply projection then rotation'.
Special matrices
- Identity I: leaves every vector unchanged. IA = A = AI.
- Diagonal: zero off the diagonal. Acts by stretching each axis independently.
- Symmetric: A = Aᵀ. Covariance matrices, Hessians, kernel matrices — all symmetric.
- Orthogonal: QᵀQ = I. Pure rotations and reflections; preserves lengths and angles.
- Triangular (upper or lower): zero below or above the diagonal. Easy to invert and to use for solving systems.
Transpose
The transpose Aᵀ swaps rows and columns. Critical identities: (AB)ᵀ = BᵀAᵀ, (Aᵀ)ᵀ = A, (A + B)ᵀ = Aᵀ + Bᵀ. The transpose is dual to the original map in a precise sense: ⟨Ax, y⟩ = ⟨x, Aᵀy⟩.
Trace and determinant
tr(A) = Σᵢ Aᵢᵢ (sum of diagonal)det(A) = signed volume scaling factor
- Trace is invariant under similarity: tr(B⁻¹AB) = tr(A). It equals the sum of eigenvalues.
- Determinant equals the product of eigenvalues. Zero iff A is singular (non-invertible).
- tr(AB) = tr(BA) even though AB ≠ BA. Used constantly in matrix calculus.
- det(AB) = det(A)·det(B). det(Aᵀ) = det(A). det(αA) = αⁿ det(A) for n×n A.
Why trace shows up in portfolio analytics
If R is the T×n matrix of returns (T days, n assets), the sample covariance is Σ̂ = (1/T) RᶜᵀRᶜ where Rᶜ is mean-centred R. The total variance — the sum of variances across all n assets — is tr(Σ̂). When we do PCA, the fraction of variance explained by the first k components is the sum of the first k eigenvalues divided by tr(Σ̂). Trace is the budget; eigenvalues are the line items.
Inverse
When A is square and non-singular, A⁻¹ is the unique matrix such that A·A⁻¹ = A⁻¹·A = I. Properties: (AB)⁻¹ = B⁻¹A⁻¹, (Aᵀ)⁻¹ = (A⁻¹)ᵀ, (αA)⁻¹ = (1/α)A⁻¹.
Never compute an inverse you don't need
In production code, you almost never compute A⁻¹ explicitly. To solve Ax = b you call a linear-system solver (LU, Cholesky, QR), which is faster and numerically more stable than forming A⁻¹ then multiplying. Computing A⁻¹b should be your last resort, used only when you genuinely need the inverse matrix itself.
Exercise
Let A = [[2, 1], [1, 3]]. (1) Compute tr(A) and det(A). (2) Compute A⁻¹ by the 2×2 formula. (3) Verify A·A⁻¹ = I.