Measuring inequality and the distribution — Measurement Module 5

Inequality is one of the most politically charged statistics, and one of the most treacherous to measure. This module covers the standard tools — the Lorenz curve and Gini coefficient — their properties and limits, and the crucial problem that surveys systematically MISS the rich, so the official inequality numbers usually understate the truth. Getting inequality measurement right matters for debates that shape policy across the program.

The Lorenz curve and Gini

The standard tools

The LORENZ CURVE plots the cumulative share of total income (or consumption) against the cumulative share of the population, ranked from poorest to richest. Perfect equality is the 45-degree line (the bottom 20% have 20% of income, etc.); the more the curve sags below the line, the more unequal the distribution (the bottom 20% have far less than 20%). The GINI COEFFICIENT summarises the Lorenz curve in one number: it is (twice) the area BETWEEN the Lorenz curve and the equality line, ranging from 0 (perfect equality — everyone has the same) to 1 (perfect inequality — one person has everything). It is the standard, most-reported inequality measure (a Gini of 0.3 is fairly equal, 0.6 is highly unequal — southern African Ginis are among the world's highest). Its appeal: a single, intuitive, comparable number. Its limits (below) matter: the Gini is most sensitive to changes in the MIDDLE of the distribution and relatively INSENSITIVE to the tails, so it can miss what's happening at the top and bottom — and two very different distributions can have the same Gini.

Other measures

The Gini is not the only measure, and the choice matters because measures differ in what they emphasise. The Theil index (an entropy measure) is DECOMPOSABLE — it can be split into inequality BETWEEN groups (regions, ethnicities) and WITHIN them, useful for analysing the structure of inequality. The Atkinson index builds in an explicit INEQUALITY-AVERSION parameter — you choose how much to weight inequality at the bottom, making the value judgment explicit (rather than the Gini's implicit one). Percentile RATIOS (the 90/10 ratio — the income of the 90th percentile divided by the 10th; the 99/50; the top 1% or 10% SHARE) are simple and transparent and focus attention on specific parts of the distribution (the top shares have been central to the Piketty-driven inequality debate). Different measures can give different rankings of two distributions, and each embeds value judgments about which inequalities matter most — so reporting more than one, and being explicit about what each captures, is good practice. There is no single 'true' inequality number; the measure is a lens.

Why surveys understate inequality

The missing rich

The most important practical problem in inequality measurement: household surveys systematically MISS THE RICH, so survey-based inequality UNDERSTATES the truth — often substantially. Why: (1) the rich are UNDER-SAMPLED (they're a small group, hard to reach, live in gated/secure settings, and a standard survey frame catches few of them); (2) the rich UNDER-RESPOND (they refuse surveys more often — non-response concentrated at the top); and (3) the rich UNDER-REPORT their income/wealth when they do respond (capital income, business income, and assets are especially under-stated). Surveys are also often TOP-CODED (very high values capped). Because inequality is driven heavily by the top of the distribution, missing or understating the top means survey Ginis and top shares are biased DOWNWARD — the official inequality numbers are too low. This is not a minor technicality: the gap between survey-measured and true inequality can be large, and it means cross-country and over-time inequality comparisons based on surveys can be misleading (a country whose rich are better-captured looks more unequal than one whose rich escape the survey, even if they're equally unequal in truth).

Closing the gap

How do you measure inequality including the rich the surveys miss? The leading approach (Piketty, Saez, Atkinson, and the World Inequality Database) uses TAX DATA — administrative records of incomes reported to tax authorities, which capture top incomes far better than surveys (the rich appear in tax records even if they dodge surveys) and allow construction of top income SHARES over long periods. Combining survey data (good for the bottom and middle) with tax data (good for the top) and national accounts gives a more complete and consistent picture — 'distributional national accounts'. The tax-data approach revealed the dramatic rise in top income shares in many countries that surveys had missed, reshaping the inequality debate. Its limits in developing countries: tax data is itself incomplete where the informal sector is large and the rich evade tax (the Tax Policy course), so even tax records miss income — and WEALTH inequality (even more concentrated than income, and even worse measured — the rich hide wealth offshore, Zucman) is harder still. The honest position: inequality is genuinely hard to measure, official survey numbers understate it, tax data helps but isn't complete (especially in developing countries), and any inequality figure should be read with awareness of what it likely misses at the top. The numbers are lower bounds more than point estimates.

Exercise

A country reports a survey-based Gini of 0.45 and a minister claims inequality is 'moderate and stable'. (1) Explain what the Gini and Lorenz curve represent and a limitation of the Gini. (2) Explain why the survey-based 0.45 likely understates true inequality and in which direction. (3) Explain how tax data could give a fuller picture and its limits here. (4) Advise how the inequality figure should be interpreted and reported.