Loading, importing, and inspecting data — Stata Module 2

The first ten minutes of any Stata session: load the data, look at it, understand its shape. Skip this and every subsequent error becomes mysterious.

use — Stata's native format

stata

use bankrates.dta, clear
* clear discards any data currently in memory

import — bringing in CSV/Excel/etc

stata

import delimited "rates.csv", clear
import excel "data.xlsx", sheet("Sheet1") firstrow clear
import sas "data.sas7bdat", clear

describe and summarize

stata

describe                  // variable list with types
summarize                 // numeric summary of all variables
summarize, detail         // with percentiles, skewness, kurtosis
summarize lending_rate, detail

codebook — the deepest inspection

codebook is the most thorough way to understand a variable: type, range, missing-value count, unique values, frequencies for categorical variables, mean/SD for continuous.

stata

codebook
codebook lending_rate

list — view raw rows

stata

list in 1/10              // first 10 observations
list month lending_rate in 1/10
list if lending_rate > 0.13

browse — interactive data view

browse opens a spreadsheet-style view of the data. Useful for visual inspection, never for analysis.

save — persisting your work

stata

save processed.dta, replace
export delimited "output.csv", replace

describe + codebook + summarize, every time

After every load, run those three. They cost nothing and catch every common data-import problem: wrong types, hidden missing values, encoding errors, decimal vs comma confusion.

Exercise

Load bankrates.dta, run describe and summarize.