Threats to validity — Impact Evaluation Module 4

Randomisation gives an RCT its power, but several threats can undermine even a well-randomised trial. This module covers the four big ones — attrition, spillovers, behavioural effects, and non-compliance — and the crucial distinction between what you'd LIKE to estimate and what a real trial with non-compliance actually delivers. Knowing these threats is what separates a sophisticated reading of an RCT from a naive one.

Attrition

When people drop out of the sample

Attrition is the loss of subjects from the sample before outcomes are measured (people move, refuse follow-up, can't be found). It threatens an RCT when it is DIFFERENTIAL — when the treatment itself affects WHO drops out, so the remaining treatment and control groups are no longer comparable (the very balance randomisation created is broken). For example, if a job-training programme's failures drop out (discouraged) while its successes stay, the remaining trained sample looks artificially good. Or if a health treatment keeps frail people alive long enough to be surveyed (who'd have died in the control), the groups differ. Differential attrition reintroduces selection bias through the back door. Defences: minimise attrition (intensive tracking), check whether attrition rates and the characteristics of attritors differ between treatment and control (if balanced, less worrying), and bound the estimate (Lee bounds — the range of effects consistent with the worst-case attrition). High or differential attrition is one of the first things to scrutinise in any RCT — it can quietly destroy the experiment's validity.

Spillovers (SUTVA violations)

When the control group is affected

RCTs rely on the Stable Unit Treatment Value Assumption (SUTVA): a unit's outcome depends only on its OWN treatment, not on whether others are treated. Spillovers (interference) violate SUTVA — when the treatment affects the CONTROL group (or other treated units). Examples: deworming reduces disease transmission, so untreated children near treated ones also get healthier (a positive spillover — Miguel-Kremer); a job-placement programme helps treated workers get jobs but at the expense of control workers competing for the same jobs (a negative, displacement spillover — general equilibrium); an information campaign spreads to the control by word of mouth. Spillovers bias the simple treatment-control comparison: if the control is positively affected, the comparison UNDERSTATES the total effect (the control improved too); if the control is harmed (displacement), it OVERSTATES the social benefit (the treated gained partly by taking from the control). The cluster-randomisation design (module 3) addresses within-group spillovers, and special designs (varying the SHARE treated across clusters) can MEASURE spillovers — but unrecognised spillovers are a serious threat, and the general-equilibrium version (the effect changes at scale) connects directly to the scale-up problem of module 8.

Behavioural effects

Subjects may change behaviour simply because they are in an experiment. The Hawthorne effect: people change their behaviour (often improve) because they know they are being OBSERVED/studied, not because of the treatment itself — so the treatment group's improvement partly reflects being watched, not the intervention (and may fade when the study ends). The John Henry effect: the CONTROL group, knowing they were denied the treatment, works HARDER to compensate (or out of rivalry), narrowing the gap and biasing the estimate toward zero. Both are forms of the experiment changing behaviour. Defences include blinding where possible (subjects don't know their status — hard for many social interventions), measuring objective outcomes (less prone to demand effects than self-reports), and being alert to them in interpretation. These effects are usually smaller than attrition or spillovers but should not be ignored, especially for self-reported or short-run outcomes.

Non-compliance: ITT and LATE

When the assigned don't comply

In real trials, not everyone does what they're assigned: some assigned to treatment don't take it (no-shows), and sometimes some assigned to control get it anyway (crossovers). This non-compliance creates a choice of what to estimate: • Intention-to-treat (ITT) — compare everyone ASSIGNED to treatment vs everyone assigned to control, regardless of whether they complied. This preserves the randomisation (assignment WAS random), so ITT is unbiased — and it answers the policy-relevant question 'what is the effect of OFFERING the programme?' (since real programmes also face take-up below 100%). But it dilutes the effect (it includes no-shows who got no treatment). • Treatment-on-the-treated / LATE — to recover the effect of actually RECEIVING the treatment, you can't just compare those who took it to those who didn't (that's selection bias again — compliers differ). Instead, use the random ASSIGNMENT as an INSTRUMENT for actual treatment (module 7): this yields the Local Average Treatment Effect (LATE) — the effect for the 'compliers' (those who take the treatment if and only if assigned to it). LATE is a real causal effect, but only for the complier subpopulation, not necessarily for always-takers or never-takers. The key insight: with non-compliance, ITT (the effect of offering) and LATE (the effect on compliers of receiving) are different, both legitimate, and answer different questions — and you must NEVER recover the treatment effect by naively comparing those who actually took it to those who didn't, because that reintroduces the selection bias the RCT was designed to eliminate.

Exercise

An RCT offers a free vocational training programme: 1,000 people are randomly offered it (treatment) and 1,000 are not (control). Only 600 of those offered actually attend, and the trial finds those who ATTENDED earn much more than the control group. Separately, 15% of all subjects couldn't be located at follow-up. (1) Explain why comparing attenders to the control group is biased, and what it reintroduces. (2) Explain intention-to-treat and why it's unbiased and policy-relevant here. (3) Explain how to recover the effect of actually attending, and what population it applies to. (4) Explain why the 15% who couldn't be located is a concern and what to check.