Data frames are the central data structure for R analysis. A data frame is a list of equal-length vectors, displayed as a table. Most functions you'll meet — read.csv, lm, summary, ggplot — are built around data frames.
Creating a data frame
df <- data.frame(bank = c("KCB", "Equity", "Coop", "NCBA"),assets_bn = c(1500, 1700, 600, 700),tier = c(1, 1, 1, 1))head(df)nrow(df); ncol(df)str(df) # structure: types and a samplesummary(df) # summary statistics
Selecting and filtering
df$bank # column as vectordf[, "bank"] # samedf[df$assets_bn > 1000, ] # filter rowsdf[1, ] # first rowdf[, c("bank", "assets_bn")] # subset columns
Tibbles — the tidyverse upgrade
A tibble is a modern data frame with better printing, better subsetting behaviour, and stricter type rules. Most tidyverse functions return tibbles. They are interchangeable with data.frame for almost all purposes.
library(tibble)tb <- tibble(bank = c("KCB", "Equity"),assets = c(1500, 1700))tb # prints with column types and dimensions# Tibbles never auto-convert characters to factors (data.frame did, until R 4.0)
Reading and writing
df <- read.csv("rates.csv")df <- read.csv("rates.csv", stringsAsFactors = FALSE) # historical, not needed in R 4.0+library(readr)df <- read_csv("rates.csv") # tidyverse version, faster, returns tibblewrite_csv(df, "output.csv")
Always inspect after reading
After read_csv, always run str(df), summary(df), and head(df). One stray text value in a numeric column will break every subsequent analysis without an error message.
Exercise
Print the first 5 rows of the pre-loaded bankrates data frame.