Skip to content
Module 07 of 1250 min readBeginner

tidyr: pivot, separate, join

Wide vs long. pivot_longer, pivot_wider. Tidy data principles in practice.

58%

Listen along

Read “tidyr: pivot, separate, join” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Pivot between wide and long formats using pivot_longer() and pivot_wider()
  • 02Apply the tidy data principle: each variable in a column, each observation in a row
  • 03Separate compound columns and unite columns with tidyr verbs
  • 04Combine tables using dplyr's join functions (left_join, inner_join, full_join)

tidyr handles reshaping data between wide and long formats. The tidy data principle: each variable in a column, each observation in a row, each value in a cell. Most analysis is easier on tidy (long) data; most reporting is easier on wide data.

pivot_longer — wide to long

r
library(tidyr)
# Wide: bank as columns
wide <- data.frame(
month = c("2024-01", "2024-02"),
KCB = c(0.13, 0.14),
Equity = c(0.12, 0.13)
)
# Long: bank as a value
long <- wide |>
pivot_longer(
cols = c(KCB, Equity),
names_to = "bank",
values_to = "rate"
)

pivot_wider — long to wide

r
long |>
pivot_wider(
names_from = bank,
values_from = rate
)

separate — split a column

r
df <- data.frame(date = c("2024-01", "2024-02"))
df |> separate(date, into = c("year", "month"), sep = "-")

Joining tables

r
library(dplyr)
left_join(banks, deposits, by = "bank") # keep all banks rows
inner_join(banks, deposits, by = "bank") # only matches
full_join(banks, deposits, by = "bank") # everything

When to use long vs wide

Long format: easier for ggplot (faceting, grouping by variable), easier for group_by + summarise, more flexible. Wide format: easier for human reading, easier for spreadsheet exports, easier for matrix operations. Most pipelines: convert to long → analyse → convert to wide for output.

Exercise

Pivot a wide data frame with months as rows and banks (KCB, Equity) as columns into long format with bank and rate columns.

Key takeaways

  • Long format for analysis; wide format for reporting. Convert at the boundaries
  • pivot_longer turns columns into key-value pairs; pivot_wider does the inverse
  • Tidy data principles (Wickham 2014) underpin the entire tidyverse design
  • left_join keeps all rows from the left table; inner_join keeps only matches; full_join keeps everything

Further reading

  1. 01

    Tidy Data

    Hadley Wickham · Journal of Statistical Software · 2014

  2. 02
  3. 03
Loading progress…
LeadAfrikPublic Economics Hub