Reproducibility in Stata is built around the do-file (your script), the log file (the run record), and the export tools that turn estimates into publication tables.
Logging a run
capture log closelog using analysis_$(today).log, replace* your analysis herelog close
Locals and globals — parameterising do-files
local controls deposit_rate i.yearregress lending_rate `controls'global CONTROLS deposit_rate i.yearregress lending_rate $CONTROLS
Loops — foreach and forvalues
foreach var of varlist lending_rate deposit_rate spread {summarize `var'histogram `var', name(g_`var', replace)}forvalues y = 2020/2024 {summarize lending_rate if year == `y'}
esttab and outreg2 — publication tables
ssc install estout, replaceregress lending_rate deposit_rateestimates store m1regress lending_rate deposit_rate i.yearestimates store m2xtreg lending_rate deposit_rate, feestimates store m3esttab m1 m2 m3 using regressions.tex, ///cells(b(star fmt(3)) se(par fmt(3))) ///stats(N r2 r2_a, fmt(0 3 3) labels("Observations" "R-squared" "Adj R-squared")) ///star(* 0.10 ** 0.05 *** 0.01) ///label replace
putexcel — custom Excel output
putexcel set output.xlsx, replaceputexcel A1 = "Bank" B1 = "Mean rate"levelsof bank_id, local(banks)local row = 2foreach b of local banks {summarize lending_rate if bank_id == `b', meanonlyputexcel A`row' = `b' B`row' = `r(mean)'local row = `row' + 1}
Reproducibility checklist
(1) Everything in a do-file. (2) Log every run. (3) Locals/globals at the top for paths and parameters. (4) esttab for tables, graph export for figures, putexcel for custom outputs. (5) Save intermediate datasets at each major step. With those five, you can rerun a year-old analysis in five minutes.
Exercise
Define a local 'controls' equal to deposit_rate i.year, then use it in a regress.