Skip to content
Module 07 of 850 min readAdvanced

Matching and instrumental variables

Propensity-score matching and selection on observables, and the instrument that buys you causation when selection is on unobservables.

88%

Listen along

Read “Matching and instrumental variables” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Explain matching/propensity scores and selection on observables
  • 02Explain the fatal weakness: selection on unobservables
  • 03Explain instrumental variables and the exclusion restriction
  • 04Place the methods in the credibility hierarchy

Sometimes you have only observational data and no threshold or natural experiment — just a treated group and an untreated group that differ. This module covers the two main strategies for these hardest cases: matching (which assumes you can adjust away the differences you can see) and instrumental variables (which can handle the differences you can't see, IF you can find a valid instrument). It ends by placing all the course's methods in a credibility hierarchy.

Matching and selection on observables

Adjusting for what you can see

Matching tackles selection bias by comparing treated and untreated units that are SIMILAR on OBSERVABLE characteristics. The idea: if a treated and an untreated unit have the same age, education, prior earnings, location, etc., then (ASSUMING those observables capture all the relevant differences) the untreated unit is a valid counterfactual for the treated one. Propensity-score matching simplifies this: instead of matching on many characteristics at once, estimate each unit's PROBABILITY of being treated (the propensity score) given its observables, and match treated to untreated units with similar propensity scores (Rosenbaum-Rubin). The identifying assumption is CONDITIONAL INDEPENDENCE (selection on observables / unconfoundedness): conditional on the observed characteristics, treatment is as-good-as-random — i.e., there are NO UNOBSERVED differences between treated and untreated units (given the observables) that affect the outcome. If that holds, matching removes selection bias. Matching is intuitive and widely used — but it lives or dies by that assumption.

The fatal weakness

Selection on unobservables

Matching's fatal weakness is that it can only adjust for what you OBSERVE — and the conditional-independence assumption (no unobserved confounders) is UNTESTABLE and usually IMPLAUSIBLE. The whole problem of selection bias (module 1) is typically driven by UNOBSERVABLES — the motivation, ability, drive, or private information that leads units to select into treatment AND affects their outcomes, and that you cannot measure. Matching on observed characteristics does nothing about these: two people with identical observed age, education, and earnings can still differ in unmeasured entrepreneurial drive, and if that drove their treatment choice, matching leaves the bias intact. So matching is only as good as the claim that you've observed and adjusted for EVERY relevant confounder — a claim that is rarely credible, because the confounders that cause selection are usually exactly the hard-to-measure ones. This is why matching sits LOW in the credibility hierarchy: it's better than a raw comparison (it removes observable differences), but it cannot solve selection on unobservables, which is the heart of the problem. Treat matching estimates with caution and never mistake 'we controlled for observables' for 'we eliminated selection bias'.

Instrumental variables

Finding as-good-as-random variation (IV)

Instrumental variables (IV) is the strategy for selection on UNOBSERVABLES. The idea: find an INSTRUMENT — a variable that affects WHETHER a unit gets treated but affects the OUTCOME ONLY THROUGH treatment (not directly or through anything else). A valid instrument isolates a slice of variation in treatment that is AS-GOOD-AS-RANDOM (not driven by the units' own confounded choices), and uses only that variation to estimate the effect. The two requirements: • Relevance — the instrument must actually affect treatment (a strong first stage; weak instruments give unreliable estimates). • Exclusion restriction (exogeneity) — the instrument affects the outcome ONLY through treatment, and is unrelated to the unobserved confounders. This is the crucial, UNTESTABLE assumption, defended by argument, not data. Examples: using DISTANCE to the nearest college as an instrument for college attendance (distance affects attendance but, arguably, not earnings except through education — Card); QUARTER OF BIRTH as an instrument for years of schooling (compulsory-schooling laws make birth-quarter affect schooling — Angrist-Krueger); RAINFALL as an instrument for income (rain affects farm income but not the outcome directly). A valid IV recovers the LATE — the effect for compliers (those whose treatment is shifted by the instrument). The catch: GOOD instruments are rare and the exclusion restriction is hard to defend (any direct effect of the instrument on the outcome invalidates it), so IV studies live or die on the credibility of the exclusion restriction — which must be argued carefully and is often contestable.

The credibility hierarchy

Ranking the methods

The methods of this course form a rough CREDIBILITY HIERARCHY, by how plausibly they eliminate selection bias: 1. RCT — randomisation balances observables AND unobservables; the gold standard (when feasible and ethical). 2. Regression discontinuity & well-designed difference-in-differences — credible natural experiments (local randomisation at a cutoff; parallel trends with good pre-trends) — strong when their assumptions hold. 3. Instrumental variables — can handle unobservables, but only as credible as the (untestable, often contestable) exclusion restriction. 4. Matching / propensity scores / regression controls — only handle selection on OBSERVABLES; cannot address unobserved confounders. This ranking reflects how much each method asks you to assume: the RCT asks almost nothing (chance did the work); matching asks you to believe you've measured every confounder (usually implausible). The practical lesson: prefer designs higher in the hierarchy where possible; when forced lower, be explicit about the assumptions, defend them carefully, probe robustness, and calibrate your confidence accordingly. A result from a credible RCT or RD deserves more weight than one from matching, and honest empirical work makes its identifying assumption — and its credibility — explicit. This hierarchy is the practical summary of the whole methods sequence.

Exercise

A researcher wants to estimate the effect of joining a farmer cooperative on farm income, using observational survey data (cooperative members vs non-members). (1) Explain how propensity-score matching would approach this and its key assumption. (2) Explain why matching is likely to fail here, citing the specific unobservable. (3) Propose an instrumental-variables strategy and state what the instrument must satisfy. (4) Place the available approaches in the credibility hierarchy and advise on the best feasible design.

Key takeaways

  • Matching/propensity scores compare treated and untreated units similar on OBSERVABLES — valid only under conditional independence (selection on observables: no unobserved confounders given the observables)
  • Matching's fatal weakness: it can't address selection on UNOBSERVABLES (the motivation/ability that drives both treatment choice and outcomes) — and that's usually what causes selection bias, so 'we controlled for observables' ≠ 'we eliminated bias'
  • Instrumental variables handles unobservables by finding an instrument that affects treatment but affects the outcome ONLY through treatment — requiring relevance (strong first stage) and the exclusion restriction (untestable, often contestable)
  • A valid IV recovers the LATE (effect on compliers); good instruments are rare and IV lives or dies on the credibility of the exclusion restriction
  • The credibility hierarchy: RCT > RD / well-designed DiD > IV > matching/controls — prefer higher designs; when forced lower, make the identifying assumption and its credibility explicit

Further reading

  1. 01

    Mostly Harmless Econometrics

    Joshua Angrist & Jörn-Steffen Pischke · Princeton University Press · 2009The definitive intuitive treatment of IV, matching, and the credibility hierarchy. The book for this module.

  2. 02

    The Central Role of the Propensity Score in Observational Studies for Causal Effects

    Paul Rosenbaum & Donald Rubin · Biometrika 70(1) · 1983The propensity-score foundation of matching. The original, and its assumption.

  3. 03

    Does Compulsory School Attendance Affect Schooling and Earnings?

    Joshua Angrist & Alan Krueger · Quarterly Journal of Economics 106(4) · 1991The famous quarter-of-birth IV. A model of the instrumental-variables strategy and its debates.

Loading progress…
LeadAfrikPublic Economics Hub