Skip to content
Module 04 of 850 min readAdvanced

Threats to validity

Attrition, spillovers, Hawthorne and John Henry effects, and non-compliance — with the local average treatment effect it leaves you.

50%

Listen along

Read “Threats to validity” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Explain how attrition can break randomisation
  • 02Explain spillovers (SUTVA violations) and their bias
  • 03Distinguish Hawthorne and John Henry effects
  • 04Distinguish intention-to-treat from the local average treatment effect under non-compliance

Randomisation gives an RCT its power, but several threats can undermine even a well-randomised trial. This module covers the four big ones — attrition, spillovers, behavioural effects, and non-compliance — and the crucial distinction between what you'd LIKE to estimate and what a real trial with non-compliance actually delivers. Knowing these threats is what separates a sophisticated reading of an RCT from a naive one.

Attrition

When people drop out of the sample

Attrition is the loss of subjects from the sample before outcomes are measured (people move, refuse follow-up, can't be found). It threatens an RCT when it is DIFFERENTIAL — when the treatment itself affects WHO drops out, so the remaining treatment and control groups are no longer comparable (the very balance randomisation created is broken). For example, if a job-training programme's failures drop out (discouraged) while its successes stay, the remaining trained sample looks artificially good. Or if a health treatment keeps frail people alive long enough to be surveyed (who'd have died in the control), the groups differ. Differential attrition reintroduces selection bias through the back door. Defences: minimise attrition (intensive tracking), check whether attrition rates and the characteristics of attritors differ between treatment and control (if balanced, less worrying), and bound the estimate (Lee bounds — the range of effects consistent with the worst-case attrition). High or differential attrition is one of the first things to scrutinise in any RCT — it can quietly destroy the experiment's validity.

Spillovers (SUTVA violations)

When the control group is affected

RCTs rely on the Stable Unit Treatment Value Assumption (SUTVA): a unit's outcome depends only on its OWN treatment, not on whether others are treated. Spillovers (interference) violate SUTVA — when the treatment affects the CONTROL group (or other treated units). Examples: deworming reduces disease transmission, so untreated children near treated ones also get healthier (a positive spillover — Miguel-Kremer); a job-placement programme helps treated workers get jobs but at the expense of control workers competing for the same jobs (a negative, displacement spillover — general equilibrium); an information campaign spreads to the control by word of mouth. Spillovers bias the simple treatment-control comparison: if the control is positively affected, the comparison UNDERSTATES the total effect (the control improved too); if the control is harmed (displacement), it OVERSTATES the social benefit (the treated gained partly by taking from the control). The cluster-randomisation design (module 3) addresses within-group spillovers, and special designs (varying the SHARE treated across clusters) can MEASURE spillovers — but unrecognised spillovers are a serious threat, and the general-equilibrium version (the effect changes at scale) connects directly to the scale-up problem of module 8.

Behavioural effects

Subjects may change behaviour simply because they are in an experiment. The Hawthorne effect: people change their behaviour (often improve) because they know they are being OBSERVED/studied, not because of the treatment itself — so the treatment group's improvement partly reflects being watched, not the intervention (and may fade when the study ends). The John Henry effect: the CONTROL group, knowing they were denied the treatment, works HARDER to compensate (or out of rivalry), narrowing the gap and biasing the estimate toward zero. Both are forms of the experiment changing behaviour. Defences include blinding where possible (subjects don't know their status — hard for many social interventions), measuring objective outcomes (less prone to demand effects than self-reports), and being alert to them in interpretation. These effects are usually smaller than attrition or spillovers but should not be ignored, especially for self-reported or short-run outcomes.

Non-compliance: ITT and LATE

When the assigned don't comply

In real trials, not everyone does what they're assigned: some assigned to treatment don't take it (no-shows), and sometimes some assigned to control get it anyway (crossovers). This non-compliance creates a choice of what to estimate: • Intention-to-treat (ITT) — compare everyone ASSIGNED to treatment vs everyone assigned to control, regardless of whether they complied. This preserves the randomisation (assignment WAS random), so ITT is unbiased — and it answers the policy-relevant question 'what is the effect of OFFERING the programme?' (since real programmes also face take-up below 100%). But it dilutes the effect (it includes no-shows who got no treatment). • Treatment-on-the-treated / LATE — to recover the effect of actually RECEIVING the treatment, you can't just compare those who took it to those who didn't (that's selection bias again — compliers differ). Instead, use the random ASSIGNMENT as an INSTRUMENT for actual treatment (module 7): this yields the Local Average Treatment Effect (LATE) — the effect for the 'compliers' (those who take the treatment if and only if assigned to it). LATE is a real causal effect, but only for the complier subpopulation, not necessarily for always-takers or never-takers. The key insight: with non-compliance, ITT (the effect of offering) and LATE (the effect on compliers of receiving) are different, both legitimate, and answer different questions — and you must NEVER recover the treatment effect by naively comparing those who actually took it to those who didn't, because that reintroduces the selection bias the RCT was designed to eliminate.

Exercise

An RCT offers a free vocational training programme: 1,000 people are randomly offered it (treatment) and 1,000 are not (control). Only 600 of those offered actually attend, and the trial finds those who ATTENDED earn much more than the control group. Separately, 15% of all subjects couldn't be located at follow-up. (1) Explain why comparing attenders to the control group is biased, and what it reintroduces. (2) Explain intention-to-treat and why it's unbiased and policy-relevant here. (3) Explain how to recover the effect of actually attending, and what population it applies to. (4) Explain why the 15% who couldn't be located is a concern and what to check.

Key takeaways

  • Attrition (dropout before outcomes are measured) threatens an RCT when DIFFERENTIAL (treatment affects who drops out), breaking the balance randomisation created — check rates and attritor characteristics across arms, and bound the estimate
  • Spillovers violate SUTVA (a unit's outcome depends only on its own treatment): if the treatment affects the control (deworming externalities; job-displacement), the simple comparison is biased — cluster designs address within-group spillovers
  • Hawthorne effect (subjects improve because observed) and John Henry effect (control works harder to compensate) are behavioural threats from being in an experiment
  • Under non-compliance, intention-to-treat (compare by random ASSIGNMENT) is unbiased and answers 'what is the effect of OFFERING the programme?' — the policy-relevant question
  • To get the effect of RECEIVING treatment, use assignment as an instrument → the LATE (effect on compliers) — NEVER compare actual takers to non-takers (that reintroduces selection bias)

Further reading

  1. 01

    Identification of Causal Effects Using Instrumental Variables

    Joshua Angrist, Guido Imbens & Donald Rubin · Journal of the American Statistical Association 91(434) · 1996The LATE framework — what IV (and a non-compliant RCT) actually estimates. Foundational.

  2. 02

    Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities

    Edward Miguel & Michael Kremer · Econometrica 72(1) · 2004The classic deworming study and its treatment of spillovers. The canonical SUTVA-violation example.

  3. 03

    Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects

    David Lee · Review of Economic Studies 76(3) · 2009How to bound estimates under attrition (Lee bounds). The tool for the attrition threat.

Loading progress…
LeadAfrikPublic Economics Hub