Instrumental variables (IV) is the cleanest correction for endogeneity, when you can find a credible instrument. A valid instrument z must satisfy two conditions:
- Relevance: z is correlated with the endogenous regressor x (Cov(z, x) ≠ 0)
- Exclusion: z affects y only through x — it has no direct effect on y, and is uncorrelated with the error (Cov(z, u) = 0)
The two conditions are NOT symmetric
Relevance is testable — run a first-stage regression of x on z and look at the F-statistic. Exclusion is fundamentally untestable from data alone — it rests on theory or institutional knowledge. Defending the exclusion restriction is the most important paragraph in any IV paper.
Two-stage least squares
The mechanical recipe:
- Stage 1: regress endogenous x on instrument z (and exogenous controls). Get fitted x̂
- Stage 2: regress y on x̂ (and the same exogenous controls). The coefficient on x̂ is the IV estimate
Don't compute SEs from the second stage manually — they'll be wrong. Use software's built-in IV routine (ivreg in R, ivregress 2sls in Stata) which adjusts SEs for the first-stage estimation.
Famous instruments
- Angrist (1990): draft lottery numbers as instrument for veteran status, estimating returns to military service
- Angrist & Krueger (1991): quarter of birth as instrument for years of schooling — compulsory-schooling laws kick in by age, so birth quarter creates exogenous variation in years completed
- Card (1995): proximity to college as instrument for years of schooling (controversial — proximity correlates with parental SES)
- Acemoglu, Johnson, Robinson (2001): settler mortality rates as instrument for institutional quality
Weak instruments
If Cov(z, x) is weak, the first stage is weak — and even small violations of the exclusion restriction are amplified into large biases in the IV estimate. Stock & Yogo (2005) gave us the rule of thumb: first-stage F-statistic above 10 means the weak-instrument bias is small (under 10% of OLS bias).
F < 10? Don't use IV
Weak instruments can produce IV estimates that are MORE biased than OLS, with wildly understated SEs. If F < 10, find a different instrument or use limited-information maximum likelihood (LIML), which is more robust to weak instruments.
LATE: what IV actually identifies
When effects are heterogeneous, IV identifies the Local Average Treatment Effect — the effect on those whose treatment status changes because of the instrument (the 'compliers'). Not the average effect, not the effect on the treated. This matters for external validity.
Imbens & Angrist (1994). The LATE may differ enormously from the average treatment effect. Quarter-of-birth IVs estimate returns to schooling for marginal high-school dropouts, not for the average worker. Always tell readers what subpopulation your LATE pertains to.
Exercise
Card (1995) used proximity to a four-year college as an instrument for years of schooling. State the relevance and exclusion conditions, and give one plausible threat to each.