Principal Component Analysis is what happens when you let the SVD loose on a centred data matrix. It is the single most-used dimension-reduction tool in finance, used to extract yield-curve factors, build covariance estimators, isolate sector exposures from a return panel, and identify common shocks across countries.
The PCA recipe
- Start with data matrix X (T observations × n features). Centre: Xᶜ = X - 1·x̄ᵀ.
- Compute the sample covariance Σ̂ = (1/(T-1)) XᶜᵀXᶜ. (Or use SVD of Xᶜ directly — numerically preferred.)
- Eigendecompose Σ̂ = QΛQᵀ. Sort eigenvalues in descending order.
- The k-th principal component is Xᶜ · qₖ — a length-T time series of scores for the k-th PC.
- Variance explained by PC k: λₖ / tr(Λ). Cumulative through PC k: Σᵢ≤ₖ λᵢ / tr(Λ).
Yield-curve PCA — the classical application
Run PCA on daily changes in Kenyan Treasury yields across maturities 3M, 6M, 1Y, 2Y, 5Y, 10Y, 15Y, 20Y. The first three principal components, almost universally, capture:
- Level — first PC, ~80-90% of variance. All maturities move together. This is the Fed/CBK policy news factor.
- Slope — second PC, ~5-10% of variance. Short rates move opposite to long rates. This is the steepening/flattening factor.
- Curvature — third PC, ~1-3% of variance. Middle maturities move opposite to short and long ends. The 'belly' factor.
Nelson-Siegel is PCA in disguise
The Nelson-Siegel parametrisation of the yield curve uses three latent factors with specific shape functions. Those shapes — level (constant), slope (decaying exponential), curvature (hump) — are remarkably close to the first three empirical PCs of most government curves. NS is not derived from PCA but is consistent with it. Fixed-income desks switch between the two depending on what they need.
PCA for risk factor extraction
On equity returns, the first PC of the NSE-20 index components is almost identical to the equally-weighted market portfolio — that's the market factor. The next 2-5 PCs typically pick up sector clusters (banking, telecoms, manufacturing) and behave like statistical factor models. This is the basis for PCA-based covariance estimators used at quant equity shops.
PCA pitfalls in finance
- Scaling matters. PCA on raw prices vs returns vs standardised returns gives entirely different decompositions. Default to standardised returns unless you have a reason.
- PCA finds variance, not signal. The first PC may dominate purely because of the largest-vol stock, not because it represents shared risk.
- PCA is unstable under sample additions when eigenvalues are close. Successive months can flip the sign of PC3 and rotate PC2 by 90°.
- PCs are orthogonal in sample but their economic interpretation can rotate. Sparse PCA and constrained variants are designed to fix this.
Empirical rule of thumb
If the top PC explains > 50% of variance, you have a strong common-factor structure (typical of equities and developed-market sovereign curves). If the top three together explain < 30%, your data is dominated by idiosyncratic noise — PCA won't help, and you may need a different model entirely.
Implementation
import numpy as npfrom numpy.linalg import svd# X: T x n centred returnsU, s, Vt = svd(X, full_matrices=False)# Principal components (scores)PC = U * s # T x n_components# Loadings (eigenvectors)loadings = Vt.T # n x n_components# Variance explainedvar_explained = s**2 / (X.shape[0] - 1)frac_explained = var_explained / var_explained.sum()
Exercise
You run PCA on standardised daily changes for 8 Kenyan Treasury yields and get explained-variance fractions [0.83, 0.10, 0.04, 0.015, 0.008, 0.004, 0.002, 0.001]. (1) Interpret. (2) Why does it usually make sense to retain only the first 3 PCs for risk attribution? (3) The loadings of PC2 are roughly (-0.5, -0.4, -0.2, 0, 0.2, 0.3, 0.4, 0.4). What shape factor is this?