Applying R to Lifestyle and Brain Health Research
University of Kansas Medical Center
October 21, 2026
The first big idea behind metaprogramming is that code is data. You can capture and compute on code like any other type of data. For example, the rlang::expr function can return exactly what you pass in.
Captured code is called an expression, a collective term for any of four types (call, symbol, constant, or pairlist). The rlang::expr allows you to capture code you type, but you need rlang::enexpr to capture code passed to a function.
Once you capture an expression, you can inspect and modify it like lists using [[ and $.
More complex manipulations with expressions require an understanding of their structure. Nearly every programming language represents code as an abstract syntax tree (AST). In R, you can inspect and manipulate the AST.
You can use rlang::call2 and unquoting to use code to create new trees. The rlang::call2 function constructs a function call from its components: the function to the call and the arguments to call it with.
You can build complex code trees by combining simpler ones with a template. The expr and enexpr functions have built-in support for this idea via !!, the unquote operator (pronounced bang-bang).
Unquoting is even more useful inside a function. For example, here is an expression to calculate the coefficient of variation.
You can evaluate (i.e., execute or run) an expression with base::eval. The eval function takes two arguments: the expression and environment. Omitting the environment causes eval to use the current one.
You can also bind names to functions when evaluating which can allow you to override the behavior of existing functions. For example, we can override the * and + function to work with strings.
The dplyr package uses this idea to run code in an environment to generate SQL for execution in a remote database.
con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:")
mtcars_db <- dplyr::copy_to(con, mtcars)
mtcars_db |>
dplyr::filter(cyl > 2) |>
dplyr::select(mpg:hp) |>
head(10) |>
dplyr::show_query()
#> <SQL>
#> SELECT `mpg`, `cyl`, `disp`, `hp`
#> FROM `mtcars`
#> WHERE (`cyl` > 2.0)
#> LIMIT 10
DBI::dbDisconnect(con)You can modify evaluation to look for variables in a data frame instead of an environment. This idea powers the base subset and transform functions, as well as many tidyverse functions like ggplot2::aes and dplyr::mutate. We can use rlang::eval_tidy which takes an expression, environment, and a data mask.
We can expand upon this idea by wrapping this pattern into a function using enexpr to get a similar function to base::with.
A problem arises with with2 in that the expression is evaluated inside of the function and not the environment where it was written. We can solve this problem by using a new data structure called a quosure which bundles the expression with an environment.
with2 <- function(df, expr, method) {
a <- 1000
if (method == "enexpr") {
rlang::eval_tidy(rlang::enexpr(expr), df)
} else if (method == "enquo") {
rlang::eval_tidy(rlang::enquo(expr), df)
}
}
df <- data.frame(x = 1:5)
a <- 10
with2(df, x + a, method = "enexpr")
#> [1] 1001 1002 1003 1004 1005
with2(df, x + a, method = "enquo")
#> [1] 11 12 13 14 15Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be represented as a tree. We can use lobstr::ast to better understand this structure.
ASTs are abstract because they only capture important structural details of the code and not whitespace or comments.
Expressions are the data structures in ASTs. They are created by parsing code and can include constants, symbols, function calls, and pairlists (such as those from expr).
Scalar constants are the simplest component of the AST. A constant is either NULL or an atomic vector of length one like TRUE, 1L, 2.5 or “x”. You can test for a constant with rlang::is_syntactic_literal. Constants are self-quoting and the expression used to represent the constant is the same as the constant.
A symbol represents the name of an object like x, mtcars, or mean. In R, symbol and name are used interchangeably (i.e., is.symbol is equal to is.name). Creating a symbol can be done by capturing code that references an object with expr or turning a string into a symbol with sym. You can turn a symbol back into a string with as.character or rlang::as_string.
The symbol type is not vectorized and is alway a length of one. If you want multiple symbols, you need to put them in a list with rlang::syms.
A call object represents a captured function call and are a special type of list (i.e., pairlist) where the first component shows the function to call and the remaining elements are the function arguments.
lobstr::ast(read.table("important.csv", row.names = FALSE))
#> █─read.table
#> ├─"important.csv"
#> └─row.names = FALSE
x <- rlang::expr(read.table("important.csv", row.names = FALSE))
is.call(x)
#> [1] TRUE
# typeof and str print language for function calls
typeof(x)
#> [1] "language"
str(x)
#> language read.table("important.csv", row.names = FALSE)Calls generally behave like lists and standard subsetting tools can be used to extract the function call and its arguments.
You can construct a call object from its components using rlang::call2. The first argument is the name of the function to call and the remaining arguments will be passed along to the call.
Parsing is the process by which a computer language takes a string and constructs an expression. It is governed by a set of rules known as a grammar.
Operator precedence is an example of the grammar used by programming languages. Predicting the precedence of arithmetic operations is easy because of how much time is spent on it in school; however, predicting other operators can be more challenging.
Most operators are left-associative (i.e., the operations are evaluated from left to right). However, exponentiation and assignment are always evaluated right to left.
You can manually parse and deparse code using rlang::parse_expr and rlang::expr_text.
Quotation is the act of capturing an unevaluated expression and unquotation is the ability to selectively evaluate parts of an otherwise quoted expression. Together, quasiquotation makes it easy to create functions combining code written by the function’s author and user. Quasiquotation is one of the three pillars of tidy evaluation (quasiquotation, quosures, and the data mask).
Imagine you are creating many strings by joining words together with the paste function and are tired of writing all those quotes. You can write a function that quotes all of its inputs.
A problem occurs when we want to use variables within cement as every input it automatically quoted. We can use !! (pronounced bang-bang) to tell a function to drop the implicit quotes.
An evaluated argument obeys R’s usual evaluation rules and a quoted argument is captured by the function and is processed in a custom way.
Quotation, the first part of quasiquotation, is capturing an expression without evaluating it. There are four important quoting functions:
Many of the rlang functions described have a base R equivalent. For example, quote is the base R equivalent of expr, substitute is the base R equivalent of enexpr, and alist is the base R equivalent of exprs.
The substitute function is most often used to capture unevaluated arguments. It also does substitution, taking an expression and substituting the values of the symbols defined in the current environment.
The difference between the rlang functions from base R is that they are quasiquoting functions which means that they can also unquote. Unquoting allows you to selectively evaluate parts of the expression that would otherwise be quoted.
Unquoting one argument can be accomplished with !! The diagram on the right shows what happens to expr(-1) assigned to x and provided to the expression expr(f(!!x), y). The !! operator also works with symbols, constants, and functions. Using !! with a function call evaluates it and puts it in the results, and preserves operator precedence.
Unquoting a function and missing arguments are less common in practice, but the !! operator can handle both situations.
# Unquoting a function needs an extra pair of parentheses
f <- rlang::expr(foo)
rlang::expr((!!f)(x, y))
#> foo(x, y)
# You can also use rlang::call2 which makes it clearer
rlang::call2(f, rlang::expr(x), rlang::expr(y))
#> foo(x, y)
# Use maybe_missing to unquote a missing argument
arg <- rlang::missing_arg()
rlang::expr(foo(!!rlang::maybe_missing(arg), !!rlang::maybe_missing(arg)))
#> foo(, )The !! operator is a one-to-one replacement. The unquote-splice !!! (pronounced bang-bang-bang) is a one-to-many replacement. It takes a list of expressions and inserts them at the location of !!!.
The !!! operator can be used in any rlang function that takes ... regardless of whether … is quoted or evaluated.
Base R has one function that implements quasiquotation. The bquote function is for quoting and it uses .() for unquoting.
The bquote function is not used by any other function in base R and its use is challenging because it is difficult to use with code supplied by a user, it does not have an unquote-splice operator for multiple expressions, and can’t handle code in an environment (e.g., functions). Instead, base R uses non-quoting instead of unquoting which selectively turns off quoting and comes in four basic forms:
A pair of quoting and non-quoting functions
A pair of quoting and non-quoting arguments
An argument that controls whether a different argument is quoting or non-quoting
Quoting if evaluation fails
The base modelling and plotting functions also use quoting and non-quoting. For example, lm quotes the weight and subset arguments, and plot quotes the aesthetic arguments (e.g., col, pch, cex) when used with a formula.
We can use !!! on a list of expressions that we want to insert into a call. This could be useful if the elements you want to put in … are already stored as a list.
We can also use !! to supply the argument name indirectly. We use := (pronounced colon-equals) to allow expressions as argument names. It looks like = but allows expressions on either side, making it a more flexible alternative that is commonly used in the data.table package.
We say that functions supporting these tools, without quoting arguments, have tiny dots. You can use tiny dots behavior in your own functions using rlang::list2.
We could use list2 to create a wrapper around attributes that allows us to set them flexibly.
set_attr <- function(.x, ...) {
attr <- rlang::list2(...)
attributes(.x) <- attr
.x
}
attrs <- list(x = 1, y = 2)
attr_name <- "z"
x <- 1:10
str(set_attr(x, w = 0, !!!attrs, !!attr_name := 3))
#> int [1:10] 1 2 3 4 5 6 7 8 9 10
#> - attr(*, "w")= num 0
#> - attr(*, "x")= num 1
#> - attr(*, "y")= num 2
#> - attr(*, "z")= num 3You can use rlang::exec to use the tiny dots technique with a function that does not have tiny dots. With the exec function, you call a function with some arguments provided directly (in …) and others indirectly (in a list).
You can also provide argument names indirectly or call different functions with the same arguments.
The rlang::list2 function is a wrapper around rlang::dots_list. The dots_list function allows users more control via additional arguments like .ignore_empty, .homonoyms, and .preserve_empty. See the help menu of rlang::dots_list for more details.
The do.call function has two main arguments: what gives a function to call and args provides a list of arguments to pass to that function.
You can use the techniques learned with quasiquotation and apply that with functions in the purrr package to print out a linear equation.
print_linear_equation <- function(formula, data) {
fit <- lm(formula, data = data)
vars <- names(coef(fit)[-1])
summands <- purrr::map2(
round(coef(fit)[-1], 2),
vars,
~ rlang::expr((!!.x * !!rlang::sym(.y)))
)
summands <- c(round(coef(fit)[[1]], 2), summands)
purrr::reduce(summands, ~ expr(!!.x + !!.y))
}
print_linear_equation(mpg ~ wt + cyl, data = mtcars)
#> 39.69 + (-3.19 * wt) + (-1.51 * cyl)
print_linear_equation(mpg ~ wt + cyl + hp, data = mtcars)
#> 38.75 + (-3.17 * wt) + (-0.94 * cyl) + (-0.02 * hp)Provides the developer the ability to evaluate quoted expressions in custom environments to achieve specific goals. In addition to quasiquotation, quosures and data masks make up the other two big ideas in tidy evaluation.
The eval function has two key arguments: expr and envir. The expr argument is the object to evaluate and env provides the environment in which the expression should be evaluated.
local()Performing a part of a calculation that creates some intermediate variables with no long-term use and possibly quite large can negatively affect memory. You can clean them up each time with rm or use local.
The local function captures the input expression and creates a new environment to evaluate it. This simulates running expr inside an environment. The exact implementation of base::local uses eval and substitute in complex ways.
source()A simple version of source can be created by combining eval and parse_expr. The source2 function below reads the file from disk, uses parse_expr to parse the string into a list of expressions, and then eval to evaluate each element.
A quosure is an object that contains an expression and environment. There are three ways to create quosures:
Use enquo and enquos to capture user-supplied expressions (most quosures should be created this way).
Use quo and quos which exist to match expr and exprs and are only included for completeness. If you use them frequently, consider whether expr and careful unquoting can eliminate the need to captures the environment.
Use new_quosure to create a quosure from its components: an expression and its environment. It is rarely needed, but useful for learning purposes.
Quosures are paired with a new evaluation function called rlang::eval_tidy that takes a single quosure instead of an expression-environment pair.
Quosures are often for convenience, making code cleaner because they pass around one object instead of an expression and environment. They become essential when working with ... as it is possible that each argument passed can have a different environment.
Quosures are a subclass of formulas which means that quosures are call objects. You can extract the expression and environment from a quosure using get_expr and get_env from the rlang package.
You can embed a quosure into an expression. This is an advanced tool, but useful to know in case you need it or encounter this strategy in another developer’s code.
Printing with rlang::expr_print in the console provides quosures that are colored based on their environment.
The data mask is a data frame where the evaluated code will look first for variable definitions. It allows you to use variables from an environment and data frame in a single expression and powers many base functions (e.g., with, subset, transform) and used throughout the tidyverse packages.
We can combine this into a wrapper function to create something similar to the base::with function.
Using a data mask can make it challenging to know whether variables will come from the data frame or the environment. To resolve this, the data mask provides the .data and .env pronouns.
The base::subset function is like dplyr::filter and provides an option to add a select argument that is similar to dplyr::select. The basic implementation for these ideas can be found in the functions below.
subset2 <- function(data, rows) {
rows <- rlang::enquo(rows)
rows_val <- rlang::eval_tidy(rows, data)
data[rows_val, , drop = FALSE]
}
subset2(mtcars, mpg > 33)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1
select2 <- function(data, ...) {
dots <- rlang::enquos(...)
vars <- as.list(rlang::set_names(seq_along(data), names(data)))
cols <- unlist(purrr::map(dots, rlang::eval_tidy, vars))
data[, cols, drop = FALSE]
}
mtcars %>%
subset2(mpg > 33) %>%
select2(mpg:wt)
#> mpg cyl disp hp drat wt
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835The base transform function acts like dplyr::mutate, allowing you to add new variables or modify existing ones in a data frame.
transform2 <- function(.data, ...) {
dots <- rlang::enquos(...)
for (i in seq_along(dots)) {
name <- names(dots)[[i]]
dot <- dots[[i]]
.data[[name]] <- rlang::eval_tidy(dot, .data)
}
.data
}
df <- data.frame(x = 1:5, y = 6:10)
transform2(df, x = -x, z = 11:15)
#> x y z
#> 1 -1 6 11
#> 2 -2 7 12
#> 3 -3 8 13
#> 4 -4 9 14
#> 5 -5 10 15Most of the time you will use tidy evaluation indirectly by calling a function that uses rlang::eval_tidy. It is common to quote and unquote arguments from the user whenever you call a quoting function. For example, cond is quoted in subsample before unquoted in subset2 in this function that resamples and subsets a dataset.
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
resample <- function(df, n) {
idx <- sample(nrow(df), n, replace = TRUE)
df[idx, , drop = FALSE]
}
subsample <- function(df, cond, n = nrow(df)) {
cond <- rlang::enquo(cond)
df <- subset2(df, !!cond)
resample(df, n)
}
set.seed(12)
subsample(df, x == 1)
#> x y
#> 2 1 2
#> 2.1 1 2
#> 3 1 3Even a simple wrapper around a quoting function using tidy evaluation can be problematic when there are no arguments to quote. This wrapper around subset2 can return incorrect results whenx exists in the calling environment and not in the data frame or when val exists in the data frame.
Instead, we need to be more specific and use pronouns.
We may also need to handle both quoting and ambiguity as in this threshold_var function that allows the user to choose the variable and the threshold.
It is not always the role of the developer to avoid ambiguity. It is the user’s responsibility to avoid ambiguity with any expressions that he/she creates as would be the case for the example below that allows expressions to be evaluated.
R for Lifestyle and Brain Health (R-LAB)