Difference-in-differences (DiD) is the workhorse of policy evaluation in economics. The intuition is simple: when a policy hits some units and not others, compare the before-after change in the treated group to the before-after change in a control group. The first difference subtracts unit-specific levels; the second subtracts shared time trends. What's left is the treatment effect — under one strong assumption.
The two-by-two case
Two groups (treated, control) and two periods (pre, post). Mean outcome in each cell:
DiD = (Y̅_treated,post − Y̅_treated,pre) − (Y̅_control,post − Y̅_control,pre)
The first parenthesis is what happened to the treated group. The second is what would have happened anyway, proxied by the control group. The difference between them is the causal effect.
The regression form
yᵢₜ = α + β · Treatedᵢ + γ · Postₜ + δ · (Treatedᵢ × Postₜ) + uᵢₜ
δ is the DiD coefficient — the causal estimate. β is the level difference between groups; γ is the time trend shared by both. The interaction picks up what's specific to the treated group in the post period.
The parallel-trends assumption
The crucial assumption: absent treatment, the treated and control groups would have followed the same trajectory. We can never directly observe this — the treated group did get treated. But we can defend the assumption by examining pre-treatment trends.
Plot the pre-trends
Always show outcomes for both groups in the periods before treatment. If they were trending in parallel, the assumption is plausible. If they diverged in the years leading up to the treatment, the design is broken — what you'll attribute to treatment is just trend continuation.
Event-study plots
The modern presentation. Estimate a coefficient for each time period relative to treatment (k = -3, -2, -1, 0, +1, +2, +3...), with k = -1 as the omitted baseline. Plot the coefficients with confidence intervals. A credible DiD shows:
- Coefficients near zero for k < 0 (no pre-trend)
- A clear jump at k = 0 (treatment takes effect)
- Persistence or fade in k > 0 (the dynamics of the effect)
Pre-trends are non-negotiable
If the leading coefficients are non-zero and trending, your design fails. The fix is finding a different control group, restricting the sample, or using a more demanding identification (synthetic control, RDD).
Two-way fixed effects
Generalising beyond two periods, the estimating equation becomes:
yᵢₜ = αᵢ + λₜ + δ · Treatₘᵢₜ + uᵢₜ
αᵢ is a unit fixed effect (absorbs everything time-invariant about i). λₜ is a time fixed effect (absorbs everything that affects all units in period t). δ identifies off the within-unit change net of common time shocks.
The staggered-treatment problem
When treatment timing varies across units (different states adopting a policy in different years), the simple two-way fixed-effects regression decomposes into a weighted average of pairwise DiD comparisons — and some of those weights are NEGATIVE, biasing the estimate even when treatment effects are positive everywhere.
Goodman-Bacon (2021), de Chaisemartin & D'Haultfœuille (2020), Callaway & Sant'Anna (2021), Sun & Abraham (2021) all documented and corrected this. The modern toolkit:
- did_imputation (Borusyak, Jaravel, Spiess 2024) — imputes counterfactuals for treated cells
- csdid (Callaway-Sant'Anna) — group-time average treatment effects
- stackedev — stacked event-study regressions, one cohort at a time
- Pre-2018 staggered DiD papers may need re-examination
Standard errors in DiD
Bertrand, Duflo & Mullainathan (2004) showed that naive standard errors in DiD regressions are wildly understated when outcomes are serially correlated within units (almost always). Cluster at the unit level — usually the level of treatment variation (state, county, firm). Need ~30+ clusters; with fewer, use wild cluster bootstrap.
Concrete example: Card-Krueger 1994
New Jersey raised its minimum wage from $4.25 to $5.05 in April 1992. Eastern Pennsylvania didn't. Card and Krueger surveyed fast-food restaurants in both areas before and after. They found employment in NJ rose modestly relative to PA — directly contradicting the standard demand-curve prediction.
The paper kicked off a 30-year debate, but the methodology — a clean two-by-two DiD on a sharp policy change — became the template for empirical labour economics. The result has held up across replications and extensions.
Exercise
You're evaluating a SACCO loan-rate cap that took effect in 2022 in 5 of 12 counties. You want to estimate its effect on borrowing. Sketch (a) the regression specification, (b) the parallel-trends test, (c) the SE clustering you'd use.