Skip to content
Module 02 of 850 min readIntermediate

Designing a survey

Sampling frames, questionnaire design, enumerator effects, and the census-vs-survey choice.

25%

Listen along

Read “Designing a survey” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Explain the sampling frame and the frame problem
  • 02Identify questionnaire-design pitfalls and measurement error
  • 03Explain enumerator effects and how to manage them
  • 04Compare census and survey and explain non-response bias

The household survey is the workhorse of policy data — the source of most poverty, employment, health, and welfare numbers. But a survey can go wrong in many ways before any analysis begins: a flawed sampling frame, badly-worded questions, interviewer effects, or non-response can bias the result from the start. This module covers designing a survey well, because the design decisions determine whether the data mean anything.

The sampling frame

The list you sample from

A survey samples from a SAMPLING FRAME — the list or map of units (households, people) from which the sample is drawn. The frame is the foundation, and if it is wrong, the sample is biased BEFORE you collect a single response. The frame problem: if the frame is incomplete or outdated, it systematically MISSES certain units, who then have zero chance of being sampled — so the survey can't represent them however carefully you sample from the frame. In developing countries this is acute: frames often rest on outdated censuses (so new settlements, migrants, and population growth are missed), miss the homeless and highly mobile, under-cover informal settlements and remote areas, and exclude those without addresses. If the missing units differ systematically (the poorest, the most marginal, the most mobile are often the hardest to frame), the survey is biased toward the easier-to-reach, and no amount of clever analysis fixes a frame that excluded the people you most need to measure. Scrutinising the frame — what it covers and what it misses — is the first question to ask of any survey, because frame errors are invisible in the data itself (you can't see who was never eligible to be sampled).

Questionnaire design

How questions are asked shapes the answers — and bad questions produce MEASUREMENT ERROR (the gap between the reported and true value) that no analysis can remove. Pitfalls: question WORDING (leading, ambiguous, or culturally inappropriate questions get biased answers); question ORDER (earlier questions prime later ones); RECALL periods (asking 'how much did you spend last month?' invites recall error — people forget; the choice of recall period systematically affects measured consumption, the consumption-recall debate of module 4); REFERENCE periods and units (inconsistent units — local measures, seasons — distort comparisons); and SENSITIVE questions (income, assets, illegal or stigmatised behaviour are under-reported). Good questionnaire design — clear, tested, culturally adapted questions with appropriate recall periods, piloted before fielding — is essential, and small wording changes can shift measured poverty or employment by large margins. The questionnaire is not a neutral instrument; it actively shapes the data.

Enumerator effects

Surveys are conducted by ENUMERATORS (interviewers), and they affect the data: different interviewers elicit different answers (through rapport, probing, interpretation, or error), respondents answer differently depending on the interviewer's gender, ethnicity, or manner (especially on sensitive topics), and enumerators can introduce error or even fabricate data (curbstoning — filling in surveys without interviewing). These ENUMERATOR EFFECTS are a real source of measurement error and bias. Management: careful TRAINING and standardisation (so all enumerators ask questions the same way), SUPERVISION and back-checks (re-interviewing a sample to detect fabrication), randomising enumerator assignment (so enumerator effects don't correlate with treatment in an evaluation), and including enumerator fixed effects in analysis. The human element of data collection is a systematic, often-underappreciated source of error that good fieldwork management controls — and poor management lets it corrupt the data.

Census, survey, and non-response

Count everyone or sample?

A CENSUS attempts to count EVERY unit in the population; a SURVEY samples a subset and infers to the population. The trade-offs: a census gives complete coverage and small-area detail (data for every district) but is enormously expensive and so is run infrequently (typically every 10 years), and even a census has coverage error (it misses people too). A survey is far cheaper and can be run frequently (annually), giving up-to-date data, but it has SAMPLING ERROR (the sample only approximates the population) and can't give reliable estimates for very small areas. Most policy data comes from surveys (frequency and cost), with the census providing the frame and the benchmark. A pervasive threat to both is NON-RESPONSE: units selected but not measured (refusals, not-at-home, unreachable). Like attrition in an RCT (the Impact Evaluation course), non-response biases the result when it is SYSTEMATIC — if the non-responders differ from responders (the rich refuse income questions, the busy are never home, the marginalised are unreachable), the achieved sample is unrepresentative even if the original sample was perfect. High or systematic non-response is a serious, common, and often under-reported problem — measured by the response rate, partly correctable by weighting (module 3), but never fully fixable. A survey's response rate is a key quality indicator to demand and scrutinise.

Exercise

A government runs a household survey to measure unemployment and poverty, using a sampling frame from the last census (12 years ago), face-to-face interviews, and a single 'how much did your household spend last month' question for consumption. The response rate is 70%. (1) Identify the frame problem and who is likely missed. (2) Identify questionnaire-design problems with the consumption question. (3) Explain how enumerator effects and the 30% non-response could bias the results. (4) Recommend improvements, and explain the census-vs-survey trade-off for this purpose.

Key takeaways

  • A survey samples from a sampling frame; if the frame is incomplete/outdated, it misses units (new settlements, migrants, the mobile and marginal) who can't be sampled — biasing the result before any data is collected, invisibly
  • Questionnaire design shapes the data: wording, order, recall periods (the consumption-recall problem), reference periods, and sensitive-question under-reporting all produce measurement error no analysis can remove
  • Enumerator effects (interviewers affect answers, can fabricate) are a real source of error — managed by training, standardisation, supervision/back-checks, and randomised assignment
  • Census (count everyone — complete but expensive/infrequent) vs survey (sample — cheap/frequent but sampling error); most policy data is from surveys, with the census providing the frame and benchmark
  • Non-response biases results when systematic (the rich refuse, the busy are never home, the marginal are unreachable) — like attrition; scrutinise the response rate and adjust by weighting, but it's never fully fixable

Further reading

  1. 01

    Designing Household Survey Questionnaires for Developing Countries (the LSMS experience)

    Margaret Grosh & Paul Glewwe (eds.) · World Bank · 2000The definitive practical guide to household-survey design — frames, questionnaires, and modules. The reference for survey design.

  2. 02

    The Analysis of Household Surveys

    Angus Deaton · World Bank / Johns Hopkins · 1997How surveys are designed and used, including measurement and the consumption-recall issues. The masterwork.

  3. 03

    Survey Methodology

    Robert Groves, Floyd Fowler, Mick Couper et al. · Wiley · 2009The comprehensive textbook on survey design, non-response, and the total-survey-error framework. The reference.

Loading progress…
LeadAfrikPublic Economics Hub