Skip to content
Module 09 of 1255 min readBeginner

Linear models with lm

lm, summary, broom::tidy. Reading coefficients, residuals, and confidence intervals the R way.

75%

Listen along

Read “Linear models with lm” aloud

Plays in your browser using on-device text-to-speech — nothing leaves the page.

Learning objectives

By the end of this module, you should be able to:

  • 01Fit OLS regressions with lm() using R's formula syntax
  • 02Read the summary(model) output: coefficients, SEs, t-statistics, p-values, R-squared, F-test
  • 03Use broom::tidy(), broom::glance(), broom::augment() to convert model output to tidy data frames
  • 04Apply robust standard errors using the sandwich and lmtest packages

lm() is R's built-in function for linear regression. The output is rich with information — coefficients, standard errors, R-squared, F-test, residual diagnostics — and the broom package wraps it into tidy data frames you can pipe into more dplyr.

Fitting an OLS regression

r
model <- lm(lending_rate ~ deposit_rate, data = bankrates)
summary(model)

The formula syntax y ~ x is the heart of R's modelling interface. lm parses the formula, builds the design matrix, fits OLS, and returns an lm object.

Multiple predictors

r
model <- lm(lending_rate ~ deposit_rate + month, data = bankrates)
# Interactions: * means main effects + interaction; : means interaction only
lm(y ~ x * z, data = df) # x + z + x:z
lm(y ~ x:z, data = df) # x:z only

Reading summary(model)

  • Coefficients: estimate, std error, t value, Pr(>|t|)
  • Residual standard error: sqrt of mean squared residual
  • Multiple R-squared: fraction of variance explained
  • F-statistic: joint test of all coefficients = 0
  • Stars: significance codes — informally read but technically a p-hacking warning sign in published work

broom — tidying regression output

r
library(broom)
tidy(model) # data frame: term, estimate, std.error, statistic, p.value
glance(model) # one-row data frame: r.squared, adj.r.squared, p.value, etc.
augment(model) # original data + fitted values + residuals

Predict and confidence intervals

r
predict(model, newdata = data.frame(deposit_rate = 0.06))
predict(model, interval = "confidence")
predict(model, interval = "prediction") # wider — for new observations

Robust standard errors

Base lm gives classical SEs. For HC-robust SEs, use the sandwich and lmtest packages: coeftest(model, vcov = vcovHC(model, type = 'HC3')).

Exercise

Fit lm(lending_rate ~ deposit_rate) on bankrates and print summary().

Key takeaways

  • lm(y ~ x, data = df) is the canonical form — formula syntax is the heart of R modelling
  • broom::tidy(model) gives a tidy data frame of coefficients you can pipe into more dplyr
  • For HC-robust SEs: coeftest(model, vcov = vcovHC(model, type = 'HC3'))
  • predict(model, newdata, interval = 'confidence') gives CIs; interval = 'prediction' gives wider PIs for new observations

Further reading

  1. 01

    An R Companion to Applied Regression

    John Fox & Sanford Weisberg · Sage · 2018

  2. 02

    Linear Models with R (2nd Edition)

    Julian J. Faraway · CRC Press · 2014

  3. 03
Loading progress…
LeadAfrikPublic Economics Hub