Skip to content
Module 01 of 1235 min readIntermediate

Stata orientation and the do-file workflow

The Stata interface, why everything happens in do-files, working directories, log files, and the help system that actually answers your question.

8%

Listen along

Read “Stata orientation and the do-file workflow” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Orient yourself in the Stata interface: Command, Results, Variables, Properties windows
  • 02Use do-files as the standard for reproducible analysis — never just the Command window
  • 03Configure working directory, logging, and the help system
  • 04Recognise why Stata persists in applied economics despite Python's rise

Stata is the working tool of academic economists, policy researchers, and the World Bank / IMF / central bank research departments. It is purpose-built for the kinds of analyses applied economists actually do — panel regressions, IV, fixed effects, survey data — with a stable, well-documented command syntax that hasn't changed much in 30 years.

Why Stata, in a Python world

Python and R have eaten most of the data-science world. Stata is the place that survives because: (1) it has by far the most-cited econometric routines (xtreg, ivreg2, reghdfe, esttab — all written by economists, for economists); (2) the help documentation is extraordinarily good; (3) the file format and command grammar haven't changed, so a do-file from 2005 still runs today; (4) regulators, ministries, and many academic journals expect Stata-compatible reproduction packages.

The four windows

  • Command — type one command at a time, see the result
  • Results — what's printed back
  • Variables — the columns of the loaded dataset
  • Properties — metadata on the selected variable

Always work in do-files

Typing commands into the Command window is fine for exploration but useless for serious work. A do-file is a text file (with .do extension) containing your commands. You execute it with `do filename.do`, and the analysis is reproducible. Every published paper that uses Stata ships a do-file.

stata
* my_analysis.do
set more off // don't pause output
capture log close // close any open log
log using analysis.log, replace // start a new log
use bankrates.dta, clear
summarize
log close

Working directory

stata
pwd // print working directory
cd "C:/projects/banking" // change directory

The help system

help command-name opens the manual entry for that command. Stata's manuals are extraordinarily good — better than R's, better than Python's. When in doubt, help [whatever] is the first move.

Comments — three styles

* line comment (must start with *). // inline comment (after a command). /* ... */ multi-line. Use them generously: a future-you reading a do-file is the most common audience.

Exercise

Open a new do-file editor and write a comment line and a summarize command (no actual data needed yet).

Key takeaways

  • Do-files are the only way to do serious work — they are the reproducible artefact
  • Stata's help system is unusually good; ?help [command] is the first move when stuck
  • Logging (log using file.log, replace) captures every command and result for the run record
  • The Stata command syntax has been stable for 20+ years — that stability is part of its value

Further reading

  1. 01

    Regression Models for Categorical Dependent Variables Using Stata (3rd Edition)

    J. Scott Long & Jeremy Freese · Stata Press · 2014

  2. 02
  3. 03

    An Introduction to Stata for Health Researchers (5th Edition)

    Svend Juul & Morten Frydenberg · Stata Press · 2021

Loading progress…
LeadAfrikPublic Economics Hub