Bootstrapping is the modern Swiss-army knife of statistical inference. Given any data sample and any statistic, it gives standard errors, confidence intervals, and hypothesis tests — without distributional assumptions. Introduced by Efron (1979), it has become the practical default whenever closed-form asymptotic results are unavailable or untrusted.
The basic idea
We have a sample x = (x₁, ..., xₙ). We compute a statistic θ̂(x). We want to know the sampling distribution of θ̂. The bootstrap pretends the empirical distribution is the population: draw B bootstrap samples x*₁, ..., x*_B by resampling with replacement from x; compute θ̂(x*_b) for each; the empirical distribution of {θ̂(x*_b)} approximates the sampling distribution of θ̂.
Three flavours of bootstrap CI
- Percentile: take the 2.5th and 97.5th percentiles of the bootstrap distribution. Simple, biased if the distribution is skewed.
- Basic (reflection): 2θ̂ - q_(0.975) and 2θ̂ - q_(0.025). Corrects for the bias direction.
- BCa (bias-corrected and accelerated, Efron 1987): the gold standard for general use. Adjusts for bias and skewness.
Block bootstrap for time series
I.i.d. bootstrap destroys the dependence structure of time-series data. Block bootstrap resamples contiguous blocks of length L, preserving short-range dependence. Variants: non-overlapping blocks (Carlstein), overlapping blocks (Künsch), stationary bootstrap (Politis-Romano) with random block length.
Choosing the block length
Rule of thumb: L ≈ n^(1/3) for stationary bootstrap, or L set by an AR(1) autocorrelation: L ≈ -log(0.05) / log(|ρ̂|) where ρ̂ is the sample lag-1 autocorrelation of the series of interest. Excessive L wastes power; too-small L misses persistence.
Parametric bootstrap
Fit a parametric model, draw simulated samples from the fitted model, compute the statistic on each. Useful when the model is plausible and the sample is small; combines distributional structure with simulation.
When bootstrap fails
- Heavy tails with infinite variance: bootstrap variance estimates can be misleading.
- Estimators of the maximum/minimum: the empirical maximum can never exceed the sample max, biasing the bootstrap distribution of the maximum.
- Boundary parameters: variance components near zero, restricted regressions.
- Strong dependence not captured by block size: long memory series need m-out-of-n bootstrap or subsampling.
Implementation
import numpy as npdef bootstrap_sharpe(returns, n_bootstrap=10000):n = len(returns)sharpes = np.empty(n_bootstrap)for b in range(n_bootstrap):sample = np.random.choice(returns, size=n, replace=True)sharpes[b] = sample.mean() / sample.std(ddof=1) * np.sqrt(252)return sharpesci_low, ci_high = np.percentile(sharpes, [2.5, 97.5])
Subsampling — when bootstrap fails
Subsampling (Politis-Romano-Wolf): draw subsamples of size m < n without replacement, compute the statistic, scale by √m/n. Valid under much weaker conditions than the bootstrap, particularly for non-smooth statistics and certain dependent data.
Exercise
A strategy has 504 daily returns (2 years). You bootstrap the Sharpe ratio with 10,000 i.i.d. resamples. (1) Why is the i.i.d. bootstrap potentially wrong for this application? (2) Outline a correct procedure. (3) Suppose the resulting 95% CI for annualised Sharpe is [0.4, 1.8]. Interpret.