Skip to content
Module 05 of 855 min readAdvanced

When you can't randomise — difference-in-differences

The parallel-trends assumption, two-way fixed effects, event-study plots, and the staggered-adoption pitfalls.

63%

Listen along

Read “When you can't randomise — difference-in-differences” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Explain difference-in-differences and the parallel-trends assumption
  • 02Set up the two-way fixed effects regression
  • 03Use event-study plots to assess pre-trends and dynamics
  • 04Explain the staggered-adoption problem with two-way fixed effects

Most policy is not randomised — it is rolled out: a law passes in some states, a programme launches in some districts, a reform happens at a date. When you can't randomise, you turn to quasi-experimental methods that exploit this natural variation. The first and most widely used is difference-in-differences, which this module covers — including the recent realisation that the standard way of estimating it can be badly wrong.

The difference-in-differences idea

Differencing out trends and fixed differences

Difference-in-differences (DiD) compares the CHANGE in outcomes over time for a group that got the treatment to the CHANGE for a group that didn't. By taking the difference of differences, it removes two confounds at once: • Differencing over time (before vs after) within each group removes any FIXED differences between the groups (the treated region was always richer — that constant difference cancels). • Differencing across groups (treated vs control) removes any common TIME TRENDS (both regions grew because the economy grew — that common trend cancels). What's left — the difference in the changes — is attributed to the treatment. Concretely: DiD = (treated_after − treated_before) − (control_after − control_before). The classic example: Card and Krueger (1994) studied a minimum-wage increase in New Jersey, using neighbouring Pennsylvania (no increase) as the control, comparing the change in fast-food employment in each — finding (controversially) no employment loss. DiD is the workhorse of policy evaluation precisely because policies so often roll out in some places/times and not others, providing the treated-and-control, before-and-after structure it needs.

The parallel-trends assumption

The key identifying assumption

DiD's validity rests entirely on the PARALLEL-TRENDS assumption: ABSENT the treatment, the treated and control groups would have followed PARALLEL paths (the same trend). The treated group's counterfactual change is assumed to equal the control group's actual change. If this holds, the control's change is a valid counterfactual for the treated's change, and DiD gives the causal effect. If it FAILS — if the treated group was already on a different trajectory (e.g., the region that adopted the policy was already growing faster for other reasons) — then DiD attributes that pre-existing differential trend to the treatment, biasing the estimate. Parallel trends is fundamentally UNTESTABLE (it's about the unobserved counterfactual), BUT it can be made more or less credible by checking PRE-TRENDS: if the groups moved in parallel BEFORE the treatment (in the periods leading up to it), that supports (doesn't prove) the assumption that they'd have continued in parallel. Divergent pre-trends are a red flag that parallel trends likely fails. Assessing parallel trends (via pre-trends) is the central task in judging any DiD study — a DiD with divergent pre-trends is not credible.

Two-way fixed effects and event studies

DiD is typically estimated with a two-way fixed effects (TWFE) regression: regress the outcome on unit fixed effects (absorbing fixed differences between units), time fixed effects (absorbing common time shocks), and a treatment indicator (the coefficient on which is the DiD estimate). With two groups and two periods this exactly reproduces the simple DiD. The event-study specification extends this: instead of a single before/after, estimate the treatment effect in each period RELATIVE to the treatment date — producing a plot of effects over event time. The event-study plot is the workhorse diagnostic: the PRE-treatment coefficients should be near zero (no pre-trend — supporting parallel trends) and the POST-treatment coefficients trace out the DYNAMIC effect (does it grow, fade, persist?). A good DiD study always shows the event-study plot, because it simultaneously tests the identifying assumption (flat pre-trends) and reveals the effect's dynamics.

The staggered-adoption problem

When TWFE goes wrong

A major recent development (the 'DiD revolution' of the late 2010s) revealed that the standard TWFE regression can be SEVERELY BIASED when treatment is adopted at DIFFERENT TIMES by different units (staggered adoption — the common real-world case, e.g., states adopting a policy in different years). The problem (Goodman-Bacon, 2021): with staggered timing and effects that change over time, TWFE implicitly uses ALREADY-TREATED units as controls for LATER-treated units — a 'forbidden comparison' that can put NEGATIVE WEIGHTS on some treatment effects, so the TWFE estimate can be wrongly signed even when every unit's true effect is positive. This is not a minor technicality — many published DiD studies using TWFE with staggered adoption may be biased. The new estimators (Callaway-Sant'Anna, de Chaisemartin-D'Haultfœuille, Sun-Abraham, and others) fix this by avoiding the forbidden comparisons (using only clean never-treated or not-yet-treated controls and aggregating effects properly). The practical lesson: for staggered-adoption DiD, do NOT rely on naive TWFE — use the modern robust estimators, and be sceptical of older studies that used TWFE on staggered designs. This is one of the most important methodological developments of recent years and a live area where the credible practice has changed.

Exercise

A researcher evaluates a health insurance programme that was rolled out to different districts in different years, using a two-way fixed effects difference-in-differences with all other districts as controls. (1) Explain the DiD logic and what it's trying to difference out. (2) Explain the parallel-trends assumption and how the researcher should assess it. (3) Explain why the staggered rollout makes the naive TWFE estimate potentially unreliable. (4) Recommend how the researcher should estimate the effect credibly.

Key takeaways

  • Difference-in-differences compares the CHANGE in outcomes for a treated group to the change for a control group — differencing out both fixed group differences (over-time differencing) and common time trends (across-group differencing)
  • Validity rests on the PARALLEL-TRENDS assumption (absent treatment, the groups would have followed parallel paths) — untestable, but assessed by checking pre-trends; divergent pre-trends are a red flag
  • DiD is estimated with two-way fixed effects (unit + time fixed effects); the event-study specification plots effects by event time — testing pre-trends and revealing dynamics (always show it)
  • The staggered-adoption problem (Goodman-Bacon): with treatment adopted at different times and time-varying effects, naive TWFE makes 'forbidden comparisons' (already-treated as controls) with negative weights — potentially wrongly signed
  • For staggered designs, use modern robust estimators (Callaway-Sant'Anna, de Chaisemartin-D'Haultfœuille) — not naive TWFE; be sceptical of older TWFE studies on staggered policies

Further reading

  1. 01

    Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania

    David Card & Alan Krueger · American Economic Review 84(4) · 1994The famous DiD study that helped launch the credibility revolution. The canonical example.

  2. 02

    Difference-in-Differences with Variation in Treatment Timing

    Andrew Goodman-Bacon · Journal of Econometrics 225(2) · 2021The paper that exposed the staggered-adoption TWFE bias. Essential for modern DiD.

  3. 03

    Difference-in-Differences with Multiple Time Periods

    Brantly Callaway & Pedro Sant'Anna · Journal of Econometrics 225(2) · 2021One of the leading robust estimators for staggered DiD. The current good practice.

Loading progress…
LeadAfrikPublic Economics Hub