R has a small set of atomic types and a long list of rules about how they coerce into each other. Memorising the coercion order is the single most useful five-minute investment in your R fluency.
The five atomic types
- numeric (double) — real numbers, the default for any number you type
- integer — whole numbers, explicit with the L suffix: 5L
- character — strings, double or single quotes
- logical — TRUE or FALSE (also written T or F)
- complex — complex numbers, rare in applied work
NA — the explicit missing value
NA is R's representation of missing data. It is contagious: any arithmetic involving NA produces NA. mean(c(1, 2, NA)) is NA, not 1.5. To skip NAs, pass na.rm = TRUE: mean(c(1, 2, NA), na.rm = TRUE) gives 1.5.
x <- c(1, 2, NA, 4, 5)mean(x) # NAmean(x, na.rm = TRUE) # 3is.na(x) # FALSE FALSE TRUE FALSE FALSEsum(is.na(x)) # 1 — count of missing values
Implicit coercion
When R combines values of different types in a vector, it coerces them all to the most permissive type. The order, from most to least permissive: character > complex > numeric > integer > logical.
c(1, 2, 3, "text") # all coerced to characterc(1, 2, TRUE, FALSE) # logical -> numeric: 1 2 1 0# Explicit coercionas.numeric("123") # 123as.character(456) # "456"as.logical(c(0, 1, 2)) # FALSE TRUE TRUE
The coercion that bites
Reading a CSV where one row has a stray text value in a numeric column — R coerces the entire column to character. Any subsequent mean() returns NA with a warning. Always check class() of every column after import.
typeof vs class
typeof() returns the underlying storage type. class() returns the (possibly user-assigned) class attribute. For most everyday values they agree; for objects (data frames, fitted models), class is what the methods dispatch on.
Exercise
Compute the mean of c(10, 20, NA, 30, 40), removing NAs.