Bootstrap and resampling — Stats for Finance Module 11

Bootstrapping is the modern Swiss-army knife of statistical inference. Given any data sample and any statistic, it gives standard errors, confidence intervals, and hypothesis tests — without distributional assumptions. Introduced by Efron (1979), it has become the practical default whenever closed-form asymptotic results are unavailable or untrusted.

The basic idea

We have a sample x = (x₁, ..., xₙ). We compute a statistic θ̂(x). We want to know the sampling distribution of θ̂. The bootstrap pretends the empirical distribution is the population: draw B bootstrap samples x*₁, ..., x*_B by resampling with replacement from x; compute θ̂(x*_b) for each; the empirical distribution of {θ̂(x*_b)} approximates the sampling distribution of θ̂.

Three flavours of bootstrap CI

Percentile: take the 2.5th and 97.5th percentiles of the bootstrap distribution. Simple, biased if the distribution is skewed.
Basic (reflection): 2θ̂ - q_(0.975) and 2θ̂ - q_(0.025). Corrects for the bias direction.
BCa (bias-corrected and accelerated, Efron 1987): the gold standard for general use. Adjusts for bias and skewness.

Block bootstrap for time series

I.i.d. bootstrap destroys the dependence structure of time-series data. Block bootstrap resamples contiguous blocks of length L, preserving short-range dependence. Variants: non-overlapping blocks (Carlstein), overlapping blocks (Künsch), stationary bootstrap (Politis-Romano) with random block length.

Choosing the block length

Rule of thumb: L ≈ n^(1/3) for stationary bootstrap, or L set by an AR(1) autocorrelation: L ≈ -log(0.05) / log(|ρ̂|) where ρ̂ is the sample lag-1 autocorrelation of the series of interest. Excessive L wastes power; too-small L misses persistence.

Parametric bootstrap

Fit a parametric model, draw simulated samples from the fitted model, compute the statistic on each. Useful when the model is plausible and the sample is small; combines distributional structure with simulation.

When bootstrap fails

Heavy tails with infinite variance: bootstrap variance estimates can be misleading.
Estimators of the maximum/minimum: the empirical maximum can never exceed the sample max, biasing the bootstrap distribution of the maximum.
Boundary parameters: variance components near zero, restricted regressions.
Strong dependence not captured by block size: long memory series need m-out-of-n bootstrap or subsampling.

Implementation

python

import numpy as np

def bootstrap_sharpe(returns, n_bootstrap=10000):
    n = len(returns)
    sharpes = np.empty(n_bootstrap)
    for b in range(n_bootstrap):
        sample = np.random.choice(returns, size=n, replace=True)
        sharpes[b] = sample.mean() / sample.std(ddof=1) * np.sqrt(252)
    return sharpes

ci_low, ci_high = np.percentile(sharpes, [2.5, 97.5])

Subsampling — when bootstrap fails

Subsampling (Politis-Romano-Wolf): draw subsamples of size m < n without replacement, compute the statistic, scale by √m/n. Valid under much weaker conditions than the bootstrap, particularly for non-smooth statistics and certain dependent data.

Exercise

A strategy has 504 daily returns (2 years). You bootstrap the Sharpe ratio with 10,000 i.i.d. resamples. (1) Why is the i.i.d. bootstrap potentially wrong for this application? (2) Outline a correct procedure. (3) Suppose the resulting 95% CI for annualised Sharpe is [0.4, 1.8]. Interpret.