Skip to content
Module 12 of 1290 min readBeginner

Three real analyses on Kenyan data

Replicate the bank-rates spread, the pension allocation shift, and the M-PESA growth curve in R.

100%

Listen along

Read “Three real analyses on Kenyan data” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Replicate three published Kenyan-data analyses end-to-end in R
  • 02Combine dplyr, ggplot2, and lm into a single coherent pipeline
  • 03Apply the lag(), pct_change(), and cumulative functions to time-series transformations
  • 04Produce publication-ready charts with scales::percent and theme_minimal

Three real analyses on Kenyan data, each replicating an analysis from this site, end-to-end in R. If you can do these without copying, you have working applied-economist R.

Project 1 — bank-rates spread

r
library(dplyr)
library(ggplot2)
result <- bankrates |>
mutate(spread = lending_rate - deposit_rate)
ggplot(result, aes(x = month, y = spread)) +
geom_line() +
labs(title = "Lending-deposit spread, Kenyan banks",
y = "Spread (pp)")
# Linear time trend
result$t <- 1:nrow(result)
model <- lm(spread ~ t, data = result)
summary(model)

Project 2 — pension allocation shift

r
pension_share <- pension |>
mutate(govt_share = govt_securities / total)
ggplot(pension_share, aes(x = period, y = govt_share)) +
geom_line() +
geom_point() +
scale_y_continuous(labels = scales::percent) +
labs(title = "Govt securities as share of pension assets")

Project 3 — M-PESA growth rate

r
library(dplyr)
mpesa_growth <- mpesa |>
mutate(yoy_growth = (volume_bn / lag(volume_bn, 12) - 1))
ggplot(mpesa_growth, aes(x = date, y = yoy_growth)) +
geom_line() +
scale_y_continuous(labels = scales::percent) +
labs(title = "M-PESA volume YoY growth")

What you've learned

If you can read in data, transform it with dplyr, plot it with ggplot, fit a regression with lm, and write up the answer in Quarto — you have working R. From here, the path is more domain (econometrics with the fixest package; time series with forecast; spatial analysis with sf; Bayesian inference with brms) and more depth, not more language.

The R community is your library

R has 20,000+ CRAN packages and a deep tradition of methodological packages tied to academic papers. Whatever obscure model or test you need, there's almost certainly a package for it. Search rOpenSci, METACRAN, and CRAN Task Views first.

Exercise

An applied-economics PhD researcher wants to publish their first solo paper. They have 50,000 observations from a Kenyan household-survey panel covering 2018-2024 and want to estimate the effect of M-Pesa adoption on household consumption. Walk through the project as an R-and-tidyverse workflow: (1) Data-management steps from raw CSV to cleaned panel. (2) The right estimation strategy (likely a difference-in-differences). (3) The R packages and code idiom for each step. (4) What goes in the reproducibility package they would share with referees, and why each piece matters.

Key takeaways

  • Three projects, three patterns: trend analysis (banking spread), structural shift (pension allocation), growth-rate computation (M-PESA)
  • If you can pipe data → transform → plot → model in one chain, you have working applied-researcher R
  • The next moves: fixest (high-dim FE), forecast (time series), sf (spatial), brms (Bayesian)
  • The R community is your library — 20,000+ CRAN packages cover virtually any model you need
Loading progress…
LeadAfrikPublic Economics Hub