Applying R to Lifestyle and Brain Health Research
University of Kansas Medical Center
August 19, 2026
There are two types of vectors:
NULL is closely related to vectors and often serves the role of a generic zero length vector.
Atomic Vector Types:
Both integer and double vectors are known as numeric vectors.
Complex and raw vectors are atomic vectors that are rarely used. Raw vectors are used in handling binary data.
You can test if a vector is a certain type with is.*()
Combining elements of different types coerces them in a fixed order:
character → double → integer → logical
A character and integer yields a character.
Every vector has attributes name-value pairs in the form of a list that attach metadata to an object
Dimension and class attributes are among the most important:
Individual attributes can be retrieved and modified with attr
Many attributes can be retrieved with attributes or set with structure
Attributes are lost by most operations.
Only two attributes are routinely preserved: names and dim
A character vector giving each element a name. There are different ways to name a vector:
An integer vector short for dimensions that turns vectors into matrices or arrays
Adding a dim attribute to a vector allows it to become a 2-dimensional matrix or multi-dimensional array. Matrices and arrays are primarily used for mathematical and statistical tools. You can create them using matrix() or array() but also by modifying the dimensions.
The attribute class turns an object into an S3 object, changing how it is handled when passed to a generic function.
Every S3 object is built on a base type. Four important S3 vectors used in base R include:
An integer vector that contains predefined values to store categorical data
Factors have two attributes:
Factors are useful when you know the set of possible values but they are not all present in the dataset. Applying labels convert the levels attribute to the labels.
sex_char <- c("m", "m", "m")
sex_factor <- factor(
sex_char,
levels = c("m", "f"),
labels = c("Male", "Female")
)
table(sex_char)
#> sex_char
#> m
#> 3
table(sex_factor)
#> sex_factor
#> Male Female
#> 3 0
str(attributes(sex_factor))
#> List of 2
#> $ levels: chr [1:2] "Male" "Female"
#> $ class : chr "factor"Date vectors are built on top of double vectors.
Removing the class shows the number of days since 1970-01-01 (the Unix Epoch)
Date-time information can be stored as POSIXct or POSIXlt
POSIXct is the simplest, built on top of double vectors, and used most in data frames. The value represents the number of seconds since 1970-01-01 (the Unix Epoch).
The tz attribute contols how the date-time object is formatted and not the instance of time represented by the vector.
The amount of time between pairs of dates or date-times.
Difftimes are built on top of double vectors
birth_date_time <- as.POSIXct("1992-04-01 10:25:00", tz = "UTC")
visit_date_time <- as.POSIXct("2025-03-15 09:00:00", tz = "UTC")
difftime(visit_date_time, birth_date_time, units = "secs")
#> Time difference of 1039905300 secs
difftime(visit_date_time, birth_date_time, units = "days")
#> Time difference of 12035.94 daysWe can divide by 365.25 to calculate a measure like age, applying as.numeric to unclass and remove the units attribute.
More complex than atomic vectors as each element in a list can be any type (not only atomic vectors).
You can construct a list with list()
Lists are sometimes referred to as recursive vectors since they can contain other lists.
Testing and Coercion: Use is.list() to test for a list and as.list to coerce to a list. You can also turn a list into an atomic vector with unlist().
A data frame is a named list of vectors built on top of lists with attributes for (column) names, row.names, and the data.frame class.
Data frames share properties of both matrices and lists but differ from lists in that the length of each of its vectors must be the same.
rownames() and colnames() with names() of a data frame being the column names.nrow() rows and ncol() columns and the length() of a data frame gives the number of columns.Share the same structure as data frames but have some key differences:
A tbl_df and tbl class.
Allows non-syntactic names without needing to set check.names to FALSE
Tibbles only recycle vectors of length one while data frames recycle columns that are an integer multiple of the longest column.
Evaluates inputs left-to-right to allow you to refer to variables during construction.
The print function is an example of a generic function and hanldes tibbles different than data frames due to the tbl_df and tbl classes. Differences in how tibbles print include:
dplyr::starwars
#> # A tibble: 87 × 14
#> name height mass hair_color skin_color eye_color birth_year
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
#> 1 Luke… 172 77 blond fair blue 19
#> 2 C-3PO 167 75 NA gold yellow 112
#> 3 R2-D2 96 32 NA white, bl… red 33
#> 4 Dart… 202 136 none white yellow 41.9
#> 5 Leia… 150 49 brown light brown 19
#> 6 Owen… 178 120 brown, gr… light blue 52
#> 7 Beru… 165 75 brown light blue 47
#> 8 R5-D4 97 32 NA white, red red NA
#> 9 Bigg… 183 84 black light brown 24
#> 10 Obi-… 182 77 auburn, w… fair blue-gray 57
#> # ℹ 77 more rows
#> # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>,
#> # species <chr>, films <list>, vehicles <list>, starships <list>
#> # ℹ Use `print(n = ...)` to see more rowsTibbles do not return a vector when subsetting a single column.
You can use drop = TRUE or [[ ]] to achieve the default data frame behavior.
Tibbles do not partial match column names when using $ whereas data frames will return any variable starting with the first character, increasing the chance of selecting the wrong variable.
is.data.frame and is.tibble can be used to check if an object is a tibble or data frame. All tibbles are data frames but data frames are not tibbles.
Coercion between a data frame and tibble is done with as.data.frame and as_tibble, but you can also change the class attributes.
List- and matrix-columns are allowed in data frames. Including lists in a data frame allows you to put any object into the data frame and keep related objects together in a row.
I() is short for identity and suggests the input should be left alone and not automatically transformed. It is only needed for data frames and not tibbles.
Matrices and data frames can also be included in a column as long as the rows are equal to the data frame.
df <- data.frame(x = 1:3 * 10)
df$y <- matrix(1:9, nrow = 3)
df$z <- data.frame(a = 3:1, b = letters[3:1])
str(df)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x: num 10 20 30
#> $ y: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
#> $ z:'data.frame': 3 obs. of 2 variables:
#> ..$ a: int 3 2 1
#> ..$ b: chr "c" "b" "a"A unique type that is always length zero and cannot have any attributes
NULL is commonly used to:
R for Lifestyle and Brain Health (R-LAB)