Matrices and the four operations — Linear Algebra Module 3

A matrix is an m×n grid of numbers, but the deeper definition is that it represents a linear map from Rⁿ to Rᵐ. Every theorem in this course flows more naturally from the linear-map view than from the grid view.

Matrix multiplication is function composition

If A is m×n and B is n×p, then AB is m×p. The product represents 'apply B, then apply A'. The (i, j) entry of AB is the inner product of row i of A with column j of B.

math

(AB)ᵢⱼ = Σₖ Aᵢₖ Bₖⱼ

Matrix multiplication is not commutative

AB ≠ BA in general. AB might not even be defined when BA is. This is not a bug; it reflects the fact that function composition isn't commutative either — 'apply rotation then projection' is not the same as 'apply projection then rotation'.

Special matrices

Identity I: leaves every vector unchanged. IA = A = AI.
Diagonal: zero off the diagonal. Acts by stretching each axis independently.
Symmetric: A = Aᵀ. Covariance matrices, Hessians, kernel matrices — all symmetric.
Orthogonal: QᵀQ = I. Pure rotations and reflections; preserves lengths and angles.
Triangular (upper or lower): zero below or above the diagonal. Easy to invert and to use for solving systems.

Transpose

The transpose Aᵀ swaps rows and columns. Critical identities: (AB)ᵀ = BᵀAᵀ, (Aᵀ)ᵀ = A, (A + B)ᵀ = Aᵀ + Bᵀ. The transpose is dual to the original map in a precise sense: ⟨Ax, y⟩ = ⟨x, Aᵀy⟩.

Trace and determinant

math

tr(A) = Σᵢ Aᵢᵢ      (sum of diagonal)
det(A) = signed volume scaling factor

Trace is invariant under similarity: tr(B⁻¹AB) = tr(A). It equals the sum of eigenvalues.
Determinant equals the product of eigenvalues. Zero iff A is singular (non-invertible).
tr(AB) = tr(BA) even though AB ≠ BA. Used constantly in matrix calculus.
det(AB) = det(A)·det(B). det(Aᵀ) = det(A). det(αA) = αⁿ det(A) for n×n A.

Why trace shows up in portfolio analytics

If R is the T×n matrix of returns (T days, n assets), the sample covariance is Σ̂ = (1/T) RᶜᵀRᶜ where Rᶜ is mean-centred R. The total variance — the sum of variances across all n assets — is tr(Σ̂). When we do PCA, the fraction of variance explained by the first k components is the sum of the first k eigenvalues divided by tr(Σ̂). Trace is the budget; eigenvalues are the line items.

Inverse

When A is square and non-singular, A⁻¹ is the unique matrix such that A·A⁻¹ = A⁻¹·A = I. Properties: (AB)⁻¹ = B⁻¹A⁻¹, (Aᵀ)⁻¹ = (A⁻¹)ᵀ, (αA)⁻¹ = (1/α)A⁻¹.

Never compute an inverse you don't need

In production code, you almost never compute A⁻¹ explicitly. To solve Ax = b you call a linear-system solver (LU, Cholesky, QR), which is faster and numerically more stable than forming A⁻¹ then multiplying. Computing A⁻¹b should be your last resort, used only when you genuinely need the inverse matrix itself.

Exercise

Let A = [[2, 1], [1, 3]]. (1) Compute tr(A) and det(A). (2) Compute A⁻¹ by the 2×2 formula. (3) Verify A·A⁻¹ = I.