Three real analyses on Kenyan data — R Module 12

Three real analyses on Kenyan data, each replicating an analysis from this site, end-to-end in R. If you can do these without copying, you have working applied-economist R.

Project 1 — bank-rates spread

library(dplyr)
library(ggplot2)

result <- bankrates |>
    mutate(spread = lending_rate - deposit_rate)

ggplot(result, aes(x = month, y = spread)) +
    geom_line() +
    labs(title = "Lending-deposit spread, Kenyan banks",
         y = "Spread (pp)")

# Linear time trend
result$t <- 1:nrow(result)
model <- lm(spread ~ t, data = result)
summary(model)

Project 2 — pension allocation shift

pension_share <- pension |>
    mutate(govt_share = govt_securities / total)

ggplot(pension_share, aes(x = period, y = govt_share)) +
    geom_line() +
    geom_point() +
    scale_y_continuous(labels = scales::percent) +
    labs(title = "Govt securities as share of pension assets")

Project 3 — M-PESA growth rate

library(dplyr)

mpesa_growth <- mpesa |>
    mutate(yoy_growth = (volume_bn / lag(volume_bn, 12) - 1))

ggplot(mpesa_growth, aes(x = date, y = yoy_growth)) +
    geom_line() +
    scale_y_continuous(labels = scales::percent) +
    labs(title = "M-PESA volume YoY growth")

What you've learned

If you can read in data, transform it with dplyr, plot it with ggplot, fit a regression with lm, and write up the answer in Quarto — you have working R. From here, the path is more domain (econometrics with the fixest package; time series with forecast; spatial analysis with sf; Bayesian inference with brms) and more depth, not more language.

The R community is your library

R has 20,000+ CRAN packages and a deep tradition of methodological packages tied to academic papers. Whatever obscure model or test you need, there's almost certainly a package for it. Search rOpenSci, METACRAN, and CRAN Task Views first.

Exercise

An applied-economics PhD researcher wants to publish their first solo paper. They have 50,000 observations from a Kenyan household-survey panel covering 2018-2024 and want to estimate the effect of M-Pesa adoption on household consumption. Walk through the project as an R-and-tidyverse workflow: (1) Data-management steps from raw CSV to cleaned panel. (2) The right estimation strategy (likely a difference-in-differences). (3) The R packages and code idiom for each step. (4) What goes in the reproducibility package they would share with referees, and why each piece matters.