Surveys are expensive and slow, and the African data gap is large — so there is enormous interest in NEW data sources that could measure poverty, economic activity, and population more cheaply and frequently: administrative records, satellite imagery, and mobile-phone data. This module covers these sources, their genuine promise to help close the data gap (even leapfrog), and the serious limits and privacy concerns they raise.
Administrative data
Administrative data is information collected by governments and organisations in the course of their operations — tax records, school enrolment and exam records, health-facility records, social-programme registries, civil registration, mobile-money transactions. Its strengths: it can be CHEAP (already collected), FREQUENT or even real-time, and FULL-POPULATION (everyone in the system, not a sample — so no sampling error and reliable for small areas). The credibility revolution in economics (the Impact Evaluation course) has increasingly run on administrative data (clean records for whole populations). Its weaknesses: it is collected for ADMINISTRATIVE not research purposes (so the variables you want may be missing or poorly measured), it has COVERAGE GAPS (it only captures those IN the system — and in developing countries the large informal sector, the unbanked, and those outside formal services are INVISIBLE in administrative records: tax data misses informal income, school records miss out-of-school children, health records miss those who never reach a facility), and it can have quality and access problems. So administrative data is powerful where coverage is good, but its blind spot — the informal and excluded, often the poorest — is exactly where measurement matters most (the Tax Policy course's third-party-information theme, applied to measurement). It complements, not replaces, surveys.
Satellite and phone data
Measuring from space and from phones
Two novel data sources have opened striking possibilities for the data gap: • Satellite imagery — images of the earth can proxy for economic and social variables. NIGHT-TIME LIGHTS (how brightly an area is lit at night) correlate with economic activity and have been used to estimate and cross-check GDP and growth, especially where official data is weak (Henderson, Storeygard, and Weil, 2012) — useful for sub-national and conflict areas where surveys can't reach. Daytime satellite imagery, combined with MACHINE LEARNING, can predict local poverty and wealth from visible features (roofs, roads, crops, density) trained on survey data (Jean et al., 2016) — generating poverty maps at fine resolution and low cost. Satellites also monitor crops, drought, and urbanisation. • Mobile-phone data — call detail records (CDRs) capture who calls/texts whom, when, and (via cell towers) WHERE, enabling measurement of mobility, social networks, and — strikingly — poverty/wealth: Blumenstock et al. (2015) predicted individuals' wealth from their phone-usage patterns. Mobile-MONEY transaction data (huge in Africa — M-PESA) traces financial flows and can proxy economic activity and financial inclusion. The excitement: these data are cheap, frequent (even real-time), and cover areas and people surveys miss — offering a way to partly LEAPFROG the data gap and measure where traditional statistics can't. They have moved from novelty to a serious complement to surveys.
Promise and limits
Proxies, bias, and privacy
The new data sources are powerful but have real limits that temper the excitement: • They are PROXIES, not ground truth — night-lights are not GDP, phone-usage is not wealth; they correlate with the target but imperfectly, and they must be CALIBRATED against survey/ground-truth data to be interpreted (so they complement, not replace, surveys — you still need the surveys to train and validate them). A proxy can mislead where the relationship breaks down. • They have their own BIASES/coverage gaps — phone data covers only phone OWNERS/users (who skew less poor, more urban, more male in some contexts), so it MISSES the very poorest and can be unrepresentative (the non-probability-sample problem of module 3 — big but biased); satellite measures miss what isn't visible from space. • The PRIVACY trade-off is serious — CDRs, mobile-money records, and fine-grained satellite/administrative data are PERSONAL and SENSITIVE (your location history, your financial transactions, your social network), so using them for measurement raises real surveillance, consent, and data-protection concerns — and the regulatory frameworks in many countries are weak (the Governance course). The same data that could map poverty could enable surveillance; the ethics (module 8) are unavoidable. So the new data sources are a genuine, exciting complement that can help close the African data gap — but they are proxies needing survey calibration, they carry their own biases (missing the poorest), and they force a hard privacy trade-off. The future is COMBINING traditional surveys (representative, ground-truth, but expensive/infrequent) with new data (cheap, frequent, wide-coverage, but proxy and biased) — not replacing one with the other.
The combined future
The realistic and promising path is INTEGRATION: use surveys as the representative, ground-truth backbone (and to calibrate/validate the new sources), and use administrative, satellite, and phone data to extend measurement to where and when surveys can't reach (finer geography, higher frequency, hard-to-survey areas), and to cross-check official figures. This combination could substantially narrow the African data gap — more frequent, finer-grained, cheaper measurement than surveys alone — IF the privacy and ethics are handled responsibly (consent, anonymisation, data protection, governance) and the proxies are properly calibrated and their biases acknowledged. The new data is not a magic replacement for the unglamorous work of good surveys and statistical capacity (module 1), but a powerful complement that, used carefully, offers real hope of measuring African economies and populations far better than the recent past — provided the field resists both the hype (treating proxies as truth) and the privacy recklessness (using sensitive data without safeguards). The measurement revolution is real but conditional on doing it responsibly.
Exercise
A government wants to map poverty at fine geographic detail to target a social programme, but its last household survey is old and covers only large regions. A team proposes using satellite imagery and mobile-phone data, processed with machine learning, to predict local poverty. (1) Explain the promise of these sources for this purpose. (2) Explain why surveys are still needed even if the new data works. (3) Explain the bias and proxy limits. (4) Explain the privacy trade-off and what safeguards are needed.