keep, drop, sort, and the if/in qualifiers are how you subset and order observations in Stata. They also explain the most common Stata gotcha: 'keep' permanently modifies the data; commands run on whatever is in memory.
keep and drop
keep if year >= 2023 // keep matching rowsdrop if missing(lending_rate) // drop missingskeep month lending_rate // keep only these columnsdrop deposit_rate // drop these columns
keep/drop are permanent
Once you keep or drop, the data is gone from memory. Always preserve the original by either: (1) saving a backup with save raw.dta first, (2) using preserve / restore around the analysis, or (3) computing on a copy.
preserve / restore
preservekeep if year == 2024summarize lending_raterestore* data is back to its original state
if and in qualifiers
summarize lending_rate if year == 2024summarize lending_rate in 1/10 // first 10 observationslist if lending_rate > 0.13 & year == 2024
_n and _N
_n is the current observation number (the row index, after any sort); _N is the total number of observations. Combined with sort, they give you positional references.
sort year monthgenerate first_obs = (_n == 1)generate last_obs = (_n == _N)generate prev_rate = lending_rate[_n - 1] // lag
sort and gsort
sort year monthgsort -lending_rate // descending; - prefix means reverse
Exercise
Keep only observations where year is 2024.