Three real analyses on Kenyan data, each replicating an analysis from this site, end-to-end in R. If you can do these without copying, you have working applied-economist R.
Project 1 — bank-rates spread
library(dplyr)library(ggplot2)result <- bankrates |>mutate(spread = lending_rate - deposit_rate)ggplot(result, aes(x = month, y = spread)) +geom_line() +labs(title = "Lending-deposit spread, Kenyan banks",y = "Spread (pp)")# Linear time trendresult$t <- 1:nrow(result)model <- lm(spread ~ t, data = result)summary(model)
Project 2 — pension allocation shift
pension_share <- pension |>mutate(govt_share = govt_securities / total)ggplot(pension_share, aes(x = period, y = govt_share)) +geom_line() +geom_point() +scale_y_continuous(labels = scales::percent) +labs(title = "Govt securities as share of pension assets")
Project 3 — M-PESA growth rate
library(dplyr)mpesa_growth <- mpesa |>mutate(yoy_growth = (volume_bn / lag(volume_bn, 12) - 1))ggplot(mpesa_growth, aes(x = date, y = yoy_growth)) +geom_line() +scale_y_continuous(labels = scales::percent) +labs(title = "M-PESA volume YoY growth")
What you've learned
If you can read in data, transform it with dplyr, plot it with ggplot, fit a regression with lm, and write up the answer in Quarto — you have working R. From here, the path is more domain (econometrics with the fixest package; time series with forecast; spatial analysis with sf; Bayesian inference with brms) and more depth, not more language.
The R community is your library
R has 20,000+ CRAN packages and a deep tradition of methodological packages tied to academic papers. Whatever obscure model or test you need, there's almost certainly a package for it. Search rOpenSci, METACRAN, and CRAN Task Views first.
Exercise
An applied-economics PhD researcher wants to publish their first solo paper. They have 50,000 observations from a Kenyan household-survey panel covering 2018-2024 and want to estimate the effect of M-Pesa adoption on household consumption. Walk through the project as an R-and-tidyverse workflow: (1) Data-management steps from raw CSV to cleaned panel. (2) The right estimation strategy (likely a difference-in-differences). (3) The R packages and code idiom for each step. (4) What goes in the reproducibility package they would share with referees, and why each piece matters.