dplyr is the data-manipulation grammar of the tidyverse. Five verbs — filter, select, mutate, summarise, arrange — plus group_by cover most analysis work. The pipe operator chains them into readable pipelines.
The pipe |> (or %>%)
The pipe takes the result of one expression and passes it as the first argument of the next. R 4.1+ has a built-in |>; the tidyverse %>% predates it and is functionally equivalent. Either is fine; use whichever your team uses.
library(dplyr)# Without pipe — nested, hard to readarrange(filter(bankrates, lending_rate > 0.13), desc(month))# With pipe — top to bottombankrates |>filter(lending_rate > 0.13) |>arrange(desc(month))
filter — select rows
bankrates |> filter(lending_rate > 0.13)bankrates |> filter(lending_rate > 0.13, month >= "2024-01") # AND
select — pick columns
bankrates |> select(month, lending_rate)bankrates |> select(-deposit_rate) # excludebankrates |> select(starts_with("l")) # helpers
mutate — create new columns
bankrates |>mutate(spread = lending_rate - deposit_rate,spread_pct = spread * 100)
summarise + group_by — split-apply-combine
# Single summarybankrates |> summarise(mean_lending = mean(lending_rate))# Groupedbankrates |>group_by(year) |>summarise(mean_lending = mean(lending_rate),mean_deposit = mean(deposit_rate),n = n())
arrange — sort
bankrates |> arrange(lending_rate) # ascendingbankrates |> arrange(desc(lending_rate)) # descendingbankrates |> arrange(year, desc(lending_rate)) # multi-key
The five-verb workflow
filter rows → select columns → mutate to create new columns → group_by + summarise to aggregate → arrange to sort. 90% of analysis pipelines fit this template.
Exercise
Using bankrates, compute the spread (lending_rate − deposit_rate) and arrange in descending order.