Optimal taxation theory — Public Finance Module 2

How should a government raise revenue, given that every tax distorts behaviour and that some distortions cost more than others? This is the optimal-taxation question, and modern public finance has roughly seventy years of formal work on it. The conclusions are surprisingly stable: broad bases, low rates, attention to elasticities, and a separate redistribution instrument. The details are where it gets interesting.

Deadweight loss — the cost of a tax beyond the revenue raised

When government imposes a tax on a market, the quantity transacted falls. Some buyers who valued the good above the pre-tax price but below the tax-inclusive price no longer buy. Some sellers who could profitably produce at the pre-tax price but not at the tax-net price exit the market. The mutually-beneficial trades that don't happen are pure loss — the deadweight loss (DWL) or excess burden.

Deadweight loss formula (linear approximation)

DWL ≈ ½ × τ² × |dQ/dτ| ≈ ½ × τ² × Q × ε / P • τ = the tax per unit (Kenya shillings per litre of fuel, say) • Q = pre-tax quantity transacted • P = pre-tax price • ε = the absolute value of the relevant elasticity (more precisely, the compensated demand elasticity adjusted for supply response) The punchline is the square: doubling the tax rate quadruples the deadweight loss. Two small taxes hurt less than one big one. Three small ones hurt even less. This is the formal justification for broad-base, low-rate tax design — spreading the rate across many bases lowers the total excess burden more than concentrating it on one.

The Ramsey inverse-elasticity rule

Frank Ramsey (1927) asked: if you must raise a fixed amount of revenue R, and the only instrument is commodity taxes (no direct taxes), how should you set the rates τ₁, τ₂, … across commodities to minimise total deadweight loss?

The Ramsey rule (inverse-elasticity form)

τᵢ / Pᵢ = k / εᵢ • τᵢ = the optimal tax on commodity i • Pᵢ = the producer price of commodity i • εᵢ = the elasticity of demand (in absolute value) for commodity i • k = a constant chosen so total revenue hits the target R In plain terms: tax inelastic goods at higher rates, tax elastic goods at lower rates. The intuition is that inelastic goods distort behaviour less per shilling of revenue raised, so loading more of the tax burden onto them minimises total distortion.

Why the Ramsey rule isn't the end of the story

Equity — inelastic goods are often necessities (food, basic medicine, fuel for cooking). Loading tax onto them is regressive. The Ramsey rule maximises efficiency but ignores distribution
Cross-elasticities — Ramsey's derivation in its simplest form assumes independent demands. With substitutes and complements, the formula generalises but loses its clean inverse-elasticity form
Production-side distortions — Diamond & Mirrlees (1971) showed that under reasonable conditions, taxes on intermediate inputs are never optimal. Tax final consumption, not the production chain. This is the economic argument for VAT-style consumption taxes over cascading sales taxes
Administration — high rates on narrow bases invite evasion. The optimal rate from a pure-Ramsey calculation may not be administrable

The Mirrlees framework — adding redistribution

James Mirrlees (1971) integrated the redistribution objective directly into optimal taxation, treating earning ability as private information (the government observes income, not ability). The Mirrlees model produces a non-linear income-tax schedule that trades off the efficiency cost of higher rates (which discourage work and effort) against the equity gain of redistributing from high-ability to low-ability earners.

Two robust qualitative results from the Mirrlees-and-after literature:

Marginal tax rates should be zero at the top of the income distribution (in the limit, for the very highest earner) — a remarkable result, though its practical relevance is limited because 'the top' is a measure-zero point
Marginal tax rates should be lower in the middle than at the bottom — because the welfare cost of distorting middle-earner labour supply is higher, given the larger number of people affected. This is the famous 'inverted-U' marginal-rate schedule

The Laffer curve and what it actually shows

Arthur Laffer drew a famous napkin sketch in 1974 showing tax revenue as a function of the tax rate: zero revenue at a 0% rate, zero revenue at a 100% rate (because nobody works), and a single maximum somewhere in between. The conservative political implication — that we are on the falling side of the curve, so cutting tax rates raises revenue — caught on. The economic claim is technically correct but empirically narrow.

The Laffer-curve trap

Empirical estimates put the revenue-maximising top marginal income-tax rate somewhere between 50% and 75% (Saez, Slemrod, Giertz 2012 review). Most countries' actual top rates sit well below this — meaning we are usually on the rising side of the Laffer curve, where tax cuts cost revenue and tax increases raise it. The 'tax cuts pay for themselves' argument has been falsified repeatedly in the empirical record (US 2001, 2017; UK 2022 mini-budget). Use the curve as a teaching device, not a policy guide.

Application to African economies

Two operational lessons:

Broad base, low rate dominates narrow base, high rate. The Kenya VAT regime (16% standard rate, ~5% of GDP yield) outperforms the Ethiopian VAT (15% standard, ~3% of GDP yield) largely because the Kenyan base is broader — fewer exemptions, fewer zero-rated items. The DWL² argument explains why this matters
Don't use commodity taxes to redistribute. The Tinbergen rule plus the Ramsey-Diamond-Mirrlees stack says: tax consumption broadly and efficiently; redistribute via targeted cash transfers (Hunger Safety Net Programme, Inua Jamii in Kenya). Politicians constantly violate this by exempting 'basic food' from VAT — which Diamond & Mirrlees (1971) showed is a less efficient redistribution channel than direct transfers funded by the broader-based VAT

Exercise

Kenya's 2023 Finance Act imposed a 16% VAT on previously zero-rated bread. Public outrage caused a partial reversal. (1) Using the Ramsey rule, what does demand elasticity for bread predict about whether bread should be taxed at the standard rate or a reduced rate? (2) Using the Diamond-Mirrlees argument, what should the public-finance-economist response be regardless of elasticity? (3) Critique the political claim 'we are protecting the poor by zero-rating bread' using the Tinbergen rule. (4) What is the policy reform that resolves the tension?