Skip to content
Module 02 of 1250 min readIntermediate

Loading, importing, and inspecting data

use, import excel, import delimited, describe, codebook, summarize. The first ten minutes of any session.

17%

Listen along

Read “Loading, importing, and inspecting data” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Load Stata's native .dta format with `use` and import CSV/Excel/SAS data with `import`
  • 02Inspect a dataset with describe, codebook, summarize, and list
  • 03Use browse for interactive visual inspection
  • 04Save processed datasets back to .dta and export to CSV

The first ten minutes of any Stata session: load the data, look at it, understand its shape. Skip this and every subsequent error becomes mysterious.

use — Stata's native format

stata
use bankrates.dta, clear
* clear discards any data currently in memory

import — bringing in CSV/Excel/etc

stata
import delimited "rates.csv", clear
import excel "data.xlsx", sheet("Sheet1") firstrow clear
import sas "data.sas7bdat", clear

describe and summarize

stata
describe // variable list with types
summarize // numeric summary of all variables
summarize, detail // with percentiles, skewness, kurtosis
summarize lending_rate, detail

codebook — the deepest inspection

codebook is the most thorough way to understand a variable: type, range, missing-value count, unique values, frequencies for categorical variables, mean/SD for continuous.

stata
codebook
codebook lending_rate

list — view raw rows

stata
list in 1/10 // first 10 observations
list month lending_rate in 1/10
list if lending_rate > 0.13

browse — interactive data view

browse opens a spreadsheet-style view of the data. Useful for visual inspection, never for analysis.

save — persisting your work

stata
save processed.dta, replace
export delimited "output.csv", replace

describe + codebook + summarize, every time

After every load, run those three. They cost nothing and catch every common data-import problem: wrong types, hidden missing values, encoding errors, decimal vs comma confusion.

Exercise

Load bankrates.dta, run describe and summarize.

Key takeaways

  • describe + codebook + summarize is the three-step inspection ritual — run it every time
  • codebook is the most thorough — type, range, missing count, unique values, frequencies
  • list with conditions (list if x > 13) shows you raw rows matching a filter
  • browse is for visual inspection only; never for analysis

Further reading

  1. 01
  2. 02

    Data Management Using Stata: A Practical Handbook

    Michael N. Mitchell · Stata Press · 2010

  3. 03
Loading progress…
LeadAfrikPublic Economics Hub