The 2019 Nobel Prize to Banerjee, Duflo, and Kremer recognised what they call 'the empirical revolution' in development economics — the systematic use of randomised controlled trials (RCTs) to identify which development interventions actually work. This module is what we've learned, what the limits are, and how to use the evidence base in policy.
Why RCTs?
Causal inference is hard. If we observe that villages with microfinance access have higher household incomes than villages without, we can't conclude microfinance caused the income difference — perhaps villages with microfinance were already richer (selection); perhaps they had other unmeasured advantages (omitted variables); perhaps causation runs the other way (richer villages attract microfinance providers).
RCTs solve the selection problem by random assignment. If we randomly choose 100 villages to receive microfinance and compare them to 100 randomly-chosen control villages, the only systematic difference between the groups is the intervention. Differences in outcomes can be attributed to the intervention with much higher confidence.
The RCT design
Standard randomised-controlled-trial structure: 1. Define the population (villages, households, schools, individuals) eligible for the intervention 2. Randomise — typically via a public lottery — into treatment (receive intervention) and control (don't) groups 3. Implement the intervention in the treatment group; do nothing (or implement a control) in the control group 4. Measure outcomes (consumption, income, health, education) at baseline (before) and endline (after), in BOTH groups 5. Compute the average treatment effect (ATE) = mean outcome in treatment − mean outcome in control. Statistical inference on whether the difference is real vs noise (typically via t-test, regression, or randomisation inference) Well-executed RCTs have high INTERNAL VALIDITY — strong confidence that the measured effect is caused by the intervention in the specific context studied. External validity (whether the result generalises to other contexts) is a separate question and often the harder one.
Canonical RCT findings
Microfinance — Banerjee et al. (2015)
Six randomised evaluations of microcredit in Bosnia, Ethiopia, India, Mexico, Mongolia, Morocco. The consistent finding: microfinance is NOT the transformational tool its evangelists claimed (Yunus, ACCION). Effects on consumption are modest (typically 0-10% gains); on business creation, modestly positive; on poverty exit, near zero. Microfinance is useful (especially for cash-flow smoothing) but not the silver bullet for poverty reduction.
Deworming — Miguel and Kremer (2004)
Western Kenya deworming programme. Schools in three rounds receiving deworming medication. Treatment-school students had 25% lower absenteeism; primary-school test-score gains modest in original analysis (revised slightly downward in subsequent re-analyses); long-run wage effects substantial (Baird et al. 2016 — 10-15% higher earnings 10-15 years later).
Deworming has been politically influential — adopted in WHO/UNICEF programmes globally, scaled to hundreds of millions of children. Also methodologically influential — the cluster-randomisation design and the long-run follow-up established standards for development RCTs.
Conditional Cash Transfers (Progresa, Bolsa Família, etc.)
Mexico's Progresa (now Oportunidades), Brazil's Bolsa Família, and dozens of follow-on programmes evaluated through RCTs. Cash transfers conditional on school attendance and health-clinic visits. Strong evidence of impacts on:
- School enrolment and attendance — clear positive effects, especially at secondary level for girls
- Child health — vaccination rates, growth, reduced child labour
- Adult consumption and welfare — durable
- Long-run outcomes — fewer follow-up studies but the first-generation Progresa cohort showed substantial long-run welfare gains
CCT programmes have been scaled massively — Brazil's Bolsa Família covers 50+ million people. African analogues (Kenya HSNP, Ethiopia PSNP, Malawi SCT) are smaller but growing.
Graduation programmes — Banerjee et al. (2015)
Six-country RCT of BRAC-style graduation programmes (productive asset + training + savings + health support + cash + coaching) for ultra-poor households. The biggest impact result in modern development RCT literature — substantial and persistent improvements in consumption, food security, assets, health, time use, well-being. Persistence to 3+ years post-programme; some persistence to 7+ years (Bandiera et al. 2017). Covered in detail in module 3.
Education interventions — multiple RCTs
- Textbooks alone — minimal impact (Glewwe-Kremer-Moulin 2009 Kenya). The 'just provide inputs' theory of education improvement falsified
- Teacher incentives (pay-for-performance) — substantial effects in India (Muralidharan-Sundararaman 2011); mixed elsewhere. Contested whether the gains persist after the incentive ends
- Targeted instruction — Pratham 'Teaching at the Right Level' programme. Strong, robust impact across multiple settings (J-PAL Africa evaluations). Now scaled across India and several African countries
- Computer-assisted learning — context-dependent. Banerjee-Cole-Duflo-Linden India studies showed substantial gains in some configurations
- Teacher contracting and accountability — Duflo-Hanna-Ryan Indonesia and India studies showed pay-for-attendance reduced teacher absenteeism and improved outcomes
External validity — when do RCT results generalise?
The biggest critique of the RCT tradition: a result demonstrated in one context (Kenyan villages in the early 2000s with specific NGO implementation) may not apply elsewhere. The literature has developed disciplines for assessing this:
- Theory-based generalisation — if the intervention works through a specific mechanism (deworming reduces parasitic burden which improves school attendance which improves long-run outcomes), the result should generalise to contexts where the mechanism is similar (parasitic infection rates in the new context comparable to Kenya 2004)
- Cross-context replication — multiple RCTs in different settings testing the same intervention. The Banerjee microcredit and BRAC graduation studies are exemplars — six-country replications with consistent or context-varying patterns
- Implementation context — RCT-tested interventions implemented by experienced NGOs at small scale may not deliver the same effects when scaled to government-run programmes. The microfinance scaling and deworming-scaling experiences both show some implementation degradation
- Beneficiary heterogeneity — interventions can have very different effects on different sub-populations within the same context. The average treatment effect can mask heterogeneity that matters for policy
The RCT critique
Major criticisms of the RCT methodology in development:
- Misses big questions — RCTs are well-suited to specific interventions at limited scale (educational technology in 50 schools; deworming in 75 villages). Macroeconomic policy (currency regimes, trade liberalisation, financial-sector reform) can't be randomised — yet these may matter more for development than the questions RCTs address (Pritchett 2002; Deaton 2009)
- Ethics — randomly denying potentially-beneficial interventions to control groups raises ethical questions. The standard response: the intervention's effectiveness is uncertain ex ante; the trial resolves the uncertainty; resources are usually scaled-up to control communities after the trial
- Narrow utilitarianism — RCT methodology privileges quantifiable, individual-level outcomes (consumption, test scores, health) over the harder-to-measure outcomes that may matter more (institutional change, voice, dignity, capability expansion in Sen's sense)
- Implementation gap — RCT-proven interventions don't always scale through bureaucratic implementation. The microfinance and deworming scaling stories both show implementation-quality degradation
- Researcher discretion — pre-registration of analysis plans, publication of null results, and full code release have become standards but compliance is imperfect
The middle position
Most working development economists hold a hybrid view: RCTs are the gold standard for evaluating SPECIFIC INTERVENTIONS where randomisation is feasible. They're a powerful complement to other methods — natural experiments, difference-in-differences, regression-discontinuity, structural modelling — for the wider questions where randomisation isn't possible. The right tool depends on the question. The J-PAL and IPA networks have built systematic evidence-translation services (J-PAL Policy Insights, IPA Practice Briefs) that synthesise RCT findings across contexts and identify what is robust vs context-dependent. These are essential reading for any working development policy-maker.
The J-PAL / IPA evidence base
J-PAL (Abdul Latif Jameel Poverty Action Lab at MIT) and IPA (Innovations for Poverty Action) have built the world's most systematic evidence base on what works in development. J-PAL Africa specifically focuses on the African context with offices in Nairobi, Kigali, Cape Town.
Operational use:
- J-PAL Policy Insights — short syntheses of what the evidence base says about a specific intervention type. Updated regularly. The starting point for any policy designer
- IPA Practice Briefs — implementer-oriented summaries with checklists for implementing the intervention well
- The Generalisability project — meta-analysis of how RCT findings transport across contexts. Specific to each intervention type
- Africa-specific evidence — J-PAL Africa Evidence Library catalogues RCTs conducted in African contexts, sortable by sector and country
Exercise
Kenya's Ministry of Education is considering a national scale-up of 'Tusome' literacy intervention — a structured-pedagogy programme that pairs primary-school teachers with new lesson materials, intensive training, and instructional-coaching support. A 2016-2019 RCT in 600 Kenyan schools (RTI International + Ministry of Education + EGRA Plus) showed substantial gains in early-grade literacy. The proposal: scale to all ~22,000 public primary schools. (1) Apply the external-validity framework: what features of the original RCT support generalisation, and what features might limit it? (2) What implementation risks could degrade the scale-up impact? (3) What complementary evidence-gathering should accompany the scale-up? (4) Recommend a phased scale-up plan that balances speed with risk management.