Skip to content
Module 04 of 1240 min readIntermediate

Filtering, sorting, and the if/in qualifiers

keep, drop, sort, gsort, the if and in syntax, _n and _N — the verbs of subsetting Stata data.

33%

Listen along

Read “Filtering, sorting, and the if/in qualifiers” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Use keep and drop to subset rows and columns
  • 02Apply the if and in qualifiers to limit any command to a subset
  • 03Use preserve/restore to operate on a snapshot without destroying the original data
  • 04Reference observations positionally with _n and _N (current and total observation numbers)

keep, drop, sort, and the if/in qualifiers are how you subset and order observations in Stata. They also explain the most common Stata gotcha: 'keep' permanently modifies the data; commands run on whatever is in memory.

keep and drop

stata
keep if year >= 2023 // keep matching rows
drop if missing(lending_rate) // drop missings
keep month lending_rate // keep only these columns
drop deposit_rate // drop these columns

keep/drop are permanent

Once you keep or drop, the data is gone from memory. Always preserve the original by either: (1) saving a backup with save raw.dta first, (2) using preserve / restore around the analysis, or (3) computing on a copy.

preserve / restore

stata
preserve
keep if year == 2024
summarize lending_rate
restore
* data is back to its original state

if and in qualifiers

stata
summarize lending_rate if year == 2024
summarize lending_rate in 1/10 // first 10 observations
list if lending_rate > 0.13 & year == 2024

_n and _N

_n is the current observation number (the row index, after any sort); _N is the total number of observations. Combined with sort, they give you positional references.

stata
sort year month
generate first_obs = (_n == 1)
generate last_obs = (_n == _N)
generate prev_rate = lending_rate[_n - 1] // lag

sort and gsort

stata
sort year month
gsort -lending_rate // descending; - prefix means reverse

Exercise

Keep only observations where year is 2024.

Key takeaways

  • keep and drop are PERMANENT — always use preserve/restore around exploratory subsets
  • if applies a logical filter; in applies a position-based slice (in 1/10 means observations 1 through 10)
  • _n is the current observation; _N is the total. lending_rate[_n-1] gives the previous row's value
  • sort orders ascending; gsort orders with - prefix for descending

Further reading

  1. 01
  2. 02

    Microeconometrics Using Stata (Revised Edition)

    A. Colin Cameron & Pravin K. Trivedi · Stata Press · 2010The reference text on Stata microeconometric practice.

  3. 03
Loading progress…
LeadAfrikPublic Economics Hub