Skip to content
Module 04 of 1255 min readIntermediate

Standard errors, p-values, and confidence

What a p-value actually says (and doesn't), heteroskedasticity-robust SEs, clustered SEs, and the bootstrap.

33%

Listen along

Read “Standard errors, p-values, and confidence” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Standard errors quantify how much an estimate would jitter under repeated sampling. They feed t-statistics, p-values, and confidence intervals — every claim of statistical significance rests on getting them right.

What a p-value actually says

A p-value is the probability of observing a test statistic at least as extreme as the one you got, IF the null hypothesis is true. That's it. It is NOT:

  • The probability the null is true (that's a Bayesian quantity)
  • The probability your finding is real
  • 1 minus statistical power

p < 0.05 ≠ true

If 100 researchers test a null that's actually true, ~5 will reject it at the 5% level — that's the design of the test. With selective publication, journals fill with the lucky 5. The replication crisis is partly this.

Heteroskedasticity-robust standard errors

The classic SE formula assumes homoskedasticity. Real data almost never has it. White (1980) gave us a heteroskedasticity-consistent estimator that requires only large samples and exogeneity:

math
Var(β̂) = (X'X)⁻¹ X' diag(û²) X (X'X)⁻¹

In Stata: , robust. In R: vcovHC() from sandwich. In statsmodels: cov_type='HC3'. Use it by default. The cost in efficiency is small; the cost of getting SEs wrong is large.

Cluster-robust standard errors

When observations are correlated within groups (students within schools, workers within firms, observations within country-years), independent-observation SEs lie. Cluster the SEs at the level of meaningful correlation:

  • Cluster at the highest level of meaningful correlation
  • Need ~30+ clusters for asymptotic results to apply
  • With few clusters, use wild bootstrap (Cameron-Gelbach-Miller 2008)

The bootstrap

Resample your data with replacement many times (≥1,000), re-estimate β̂ each time. The standard deviation across the bootstrap replications is your SE. Doesn't require closed-form analytics; works for almost any estimator.

When to bootstrap

When you can't write down an analytical SE — chained estimators, complex weighting, ratios of estimates. The bootstrap is also a useful sanity check on analytical SEs that look surprising.

Confidence intervals

A 95% CI for β: β̂ ± 1.96 × SE. Interpretation: 'in repeated sampling, 95% of intervals constructed this way would contain the true β.' Not 'there's a 95% chance the true β is in this interval' — that's Bayesian language for a frequentist quantity.

Exercise

You estimate β̂ = 0.50, SE = 0.20, robust. Compute the t-stat, give a rough p-value, and a 95% CI.

Loading progress…
LeadAfrikPublic Economics Hub