Functions

Applying R to Lifestyle and Brain Health Research

Ashlyn Barry

University of Wisconsin - Madison

September 9, 2026

Functions

Functions include three components: arguments, body, and environment.

  • The arguments or formals() control parts of the function
  • The body() is the code inside the function
  • The environment() contains values associated with the names.
formals(sd)
#> $x
#>
#>
#> $na.rm
#> [1] FALSE

body(sd)
#> sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
#>     na.rm = na.rm))

environment(sd)
#> <environment: namespace:stats>

Unlike the formals and body, the environment is specified implicitly based on where you define the function. The function’s environment always exists.

Attributes

Functions may have different attributes like srcref short for source reference. It points to the source code used to create the function and is used for printing because it contains code comments and other formatting.

f02 <- function(x, y) {
  # A comment
  x + y
}

attr(f02, "srcref")

#> function(x, y){
#>   # A comment
#>   x + y
#> }

Primitive functions

Primitive functions exist in the base package and call C code directly. These functions have either type builtin or special. In this case, the functions exist primarily in C so their formals, body, and environment are NULL.

sum
#> function (..., na.rm = FALSE)  .Primitive("sum")
typeof(sum)
#> [1] "builtin"

`[`
#> .Primitive("[")
typeof(`[`)
#> [1] "special"

First-class functions

R functions are objects without a special syntax for defining and naming a function. You simply create a function object using function and bind it to a name with <-.

f01 <- function(x) {
  sin(1 / x^2)
}

Nearly all functions are bound to a name, but there are times where anonymous or list functions are preferred.

unlist(lapply(mtcars, function(x) length(unique(x))))
#>   mpg  cyl disp  hp  drat  wt  qsec   vs  am  gear carb
#>   25    3   27   22   22   29   30    2    2    3    6

funs <- list(
  half = function(x) x / 2,
  double = function(x) x * 2
)

funs$double(10)
#> [1] 20
funs$half(10)
#> [1] 5

Invoking a function

A function is usually called by placing the arguments inside parentheses next to the function name. In some instances, the arguments are contained within a data structure and it is easier to use do.call and pass a list containing the function arguments.

args <- list(1:10, na.rm = TRUE)

do.call(mean, args)
#> [1] 5.5

Function composition

Base R provides two ways to compose multiple function calls. You can either save the intermediate results as variables or nest the function calls.

square <- function(x) x^2
deviation <- function(x) x - mean(x)

x <- 1:20

# Saving intermediate results
out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)

# Nested
sqrt(mean(square(deviation(x))))

The magrittr package provides a third option using the binary operator %>% which is called a pipe and pronounced as “and then”. Base R developed a comparable solution with |>; however, it lacks some of the advanced features of %>% like being able to use a placeholder . in multiple locations.

library(magrittr)

x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  {
    paste0("Square root of ", ., " is ", round(sqrt(.), 2))
  }

Lexical scoping

R looks up the values of names based on how a function is defined and not how it is called. Lexical is a technical term used in computer science that tells us that the scoping rules use a parse-time structure rather than a run-time structure. The lexical scoping in R follows four primary rules: (1) name masking, (2) functions versus variables,(3) a fresh start, and (4) dynamic lookup.

Name masking

Names defined inside of a function mask names dfined outside of a function.

x <- 10
y <- 20

g01 <- function() {
  x <- 1
  y <- 2
  c(x, y)
}

g01()
#> [1] 1 2

R looks one level up if a name is not defined inside of a function.

x <- 2

g02 <- function() {
  y <- 1
  c(x, y)
}

g02()
#> [1] 2 1

The same rules apply if a function is defined inside another function. If R is unable to find the value, it looks for the value in the location that the function is defined, all the way up to the global environment including other loaded packages.

x <- 1

g03 <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}

g03()
#> [1] 1 2 3

Functions versus variables

The name masking rules apply to functions. However, a function and an object sharing the same name must reside in different environments.

g04 <- function(x) x + 1
g05 <- function(x) x * 2

g06 <- function() {
  g04 <- function(x) x + 100
  g05 <- 10
  c(g04(10), g05(g05))
}

g06()
#> [1] 110 20

A fresh start

A new environment is created each time a function is called to execute the function.

g07 <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}

g07()
#> [1] 1

g07()
#> [1] 1

Dynamic lookup

Values are searched for when the function is run. Thus, values can differ based on the objects outside of the function’s environment.

g08 <- function() x + 1

x <- 15

g08()
#> [1] 16

x <- 20

g08()
#> [1] 21

R relies on lexical scoping to find everything. Potential problems with dynamic lookup are not found when you create the function and error messages may never be returned depending on the variables defined in a user’s global environment.

The codetools::findGlobals function can be used to detect external dependencies within a function and setting a function’s environment to empty using emptyenv can help solve this problem. Of course, you would then need to add the functions from findGlobals manually to an environment before calling the function.

codetools::findGlobals(g08)
#> [1] "+" "x"

my_env <- new.env(parent = emptyenv())
environment(g08) <- my_env

g08()
#> Error in `x + 1`: could not find function "+"

my_env$x <- 15
my_env$`+` <- base::`+`
my_env$`{` <- base::`{`

g08()
#> [1] 16

Lazy Evaluation

Functions are lazily evaluated, meaning that they are only evaluated if accessed. This allows you to include potentially expensive computations in function arguments that are only evaluated if needed.

h01 <- function(x) {
  10
}
# No error is generated because x is never used.
h01(stop("This is an error!"))
#> [1] 10

Promises

Lazy evaluation is powered by a data structure called a promise. A promise has 3 components:

  1. An expression that defers evaluation until the value is used (e.g., x + y).
  2. An environment where the function is called and the expression is evaluated.
y <- 10

h02 <- function(x) {
  y <- 100
  x + 1
}

# Returns 11 not 101 since x is assigned a value of 10
# from y outside of the function.

h02(y)
#> [1] 11

# Assignment inside a function call binds the
# variable outside the function
h02(y <- 1000)
#> [1] 1001

print(y)
#> [1] 1000
  1. A value that is computed and cached the first time a promise is accessed.
# Ensures that Calculating... is only printed once

double <- function(x) {
  message("Calculating...")
  x * 2
}

h03 <- function(x) {
  c(x, x)
}

h03(double(20))
#> Calculating...
#> [1] 40 40

Default arguments

Default values can be defined based on other arguments or even variables defined later in the function. Many base R functions use this strategy, but it is harder to read the code adn predict what is returned.

h04 <- function(x = 1, y = x * 2, z = a + b) {
  a <- 10
  b <- 100
  c(x, y, z)
}

h04()
#> [1] 1 2 110

The evaluation environment is slightly different for default and user supplied arguments. Default arguments are evaluated inside the function.

h05 <- function(x = ls()) {
  a <- 1
  x
}

# ls() evaluated inside h05
h05()
#> [1] "a" "x"

# ls() evaluated in the global environment
h05(x = ls())
#> "h05"

Missing arguments

We can determine if an argument’s value comes from the user or from a default using the missing function. The sample function uses this technique.

h06 <- function(x = 10) list(missing(x), x)

str(h06())
#> List of 2
#>  $ : logi TRUE
#>  $ : num 10

str(h06(10))
#> List of 2
#>  $ : logi FALSE
#>  $ : num 10

An Example with Sample

This code is from base::sample which takes a sample of the specified size from the elements of x with or without replacement.

  • The missing function evaluates whether size is given and returns the length of x if it is missing.
  • This function makes it appear as if x and size are required arguments.
sample <- function(x, size, replace = FALSE, prob = NULL) {
  if (length(x) == 1L && is.numeric(x) && is.finite(x) && x >= 1) {
    if (missing(size)) {
      size <- x
    }
    sample.int(x, size, replace, prob)
  } else {
    if (missing(size)) {
      size <- length(x)
    }
    x[sample.int(length(x), size, replace, prob)]
  }
}

An alternative way to write sample is setting size = NULL in the function arguments to indicate that it is not required. A simpler version of the sample function is to check for NULL.

sample <- function(x, size = NULL, replace = FALSE, prob = NULL) {
  if (is.null(size)) {
    size <- length(x)
  }
  # Or using the `%||%` operator from base R to use the left
  # side if it's not NULL and return the right side otherwise.
  # size <- size %||% length(x)
  x[sample.int(length(x), size, replace, prob)]
}

The … (dot-dot-dot)

A special argument ... (pronounced dot-dot-dot) can be used in a function to take any additional number of arguments. These arguments can either be used inside the function or passed to another function.

i01 <- function(y, z) {
  list(y = y, z = z)
}

i02 <- function(x, ...) {
  i01(...)
}

# Passing x, y, and z to i01
str(i02(x = 1, y = 2, z = 3))

#> List of 2
#>  $ y: num 2
#>  $ z: num 3

It can be useful to store the arguments in a list to pass along to a different function. This is how the lapply function works in R.

mylapply <- function(X, FUN, ...) {
  FUN <- match.fun(FUN)
  if (!is.vector(X) || is.object(X)) {
    X <- as.list(X)
  }
  .Internal(lapply(X, FUN))
}

x <- list(c(1, 3, NA), c(4, NA, 6))

str(mylapply(x, mean, na.rm = TRUE))

#> List of 2
#>  $ : num 2
#>  $ : num 5

S3 generic functions can use ... to allow the methods to take additional arguments. A function like print would have too many arguements for all the objects it would need to display.

myprint <- function(x, ...) {
  print(x, ...)
}

myprint(factor(letters), max.levels = 4)
#>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
#> 26 Levels: a b c ... z

print("Greetings!", quote = FALSE)
#> [1] Greetings!

The downsides of using ... is that more documentation may be needed to help user’s understand what arguments they can pass and where those arguments go. Additionally, misspelled arguments go unnoticed as they disappear and do not raise an error.

str(mylapply(x, mean, na_rm = TRUE))

#> List of 2
#>  $ : num NA
#>  $ : num NA

Exiting a function

Functions exit by returning a value or throwing an error. Returns can be implicit where the last evaluated expression is returned or explicit by calling return.

# Implicit
j01 <- function(x) {
  if (x < 10) {
    0
  } else {
    10
  }
}

j01(9)
#> [1] 0

# Explicit
j02 <- function(x) {
  if (x < 10) {
    return(0)
  } else {
    return(10)
  }
}

Most functions return visibly, but the invisible function prevents automatic printing of the last value. The most common function that returns invisibly is <-.

a <- 2

# You can display visibly with print or by wrapping in ()
print(a <- 2)
#> [1] 2

(a <- 2)
#> [1] 2

Errors

Errors occur when a function cannot complete its assigned task. It uses the stop function to stop the execution of the function.

double <- function(x) {
  if (!is.numeric(x)) {
    stop(sprintf("%s is not numeric", x))
  }
  return(x * 2)
}

double(10)
#> [1] 20

double("10")
#> Error in double("10") : 10 is not numeric

Exit handlers

A function may need to make a temporary change to the global state. However, cleaning up those changes may be problematic if there is an error. Using on.exit ensures those changes are undone and the global state restored.

pretty_print <- function(x) {
  old_digits <- getOption("digits")
  options(digits = 2)
  on.exit(options(digits = old_digits), add = TRUE)
  print(x / 3)
}

pretty_print(1:5)
#> [1] 0.33 0.67 1.00 1.33 1.67

print(1 / 3)
#> [1] 0.3333333

You can use after = TRUE or after = FALSE to control the order of on.exit within a function if you need some actions to be performed in a specific order.

Function forms

There are four varieties of functions in R:

  • Prefix: The function name comes before its arguments (e.g., print(x))
  • Infix: The function name comes in between its arguments (e.g., x + y)
  • Replacement: The function replaces values by assignment (e.g., names(x) <- c(“a”, “b”, “c”))
  • Special: Functions like [[, if, and for

An interesting property of R is that all function varieties can be rewritten to prefix form.

x <- 1:5
y <- 2

`+`(x, y)
#> [1] 3 4 5 6 7

str(`names<-`(data.frame(x, y), c("a", "b")))
#> 'data.frame':   5 obs. of  2 variables:
#>  $ a: int  1 2 3 4 5
#>  $ b: num  2 2 2 2 2

`for`(i, x, print(i))
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5

Knowing the name of a non-prefix function allows you to override its behavior.

`+` <- function(x, y) {
  return(x * y)
}

5 + 10
#> [1] 50

rm(`+`)

5 + 10
#> [1] 15

Prefix Functions

Most common functions in R and can specify functions in three ways:

  • Position (e.g., help(mean))
  • Partial Matching (e.g., help(top = mean))
  • Name (e.g., help(topic = mean))
get_structure <- function(age_years, weight_kg, amyloid_level) {
  str(list(age = age_years, weight = weight_kg, amyloid = amyloid_level))
}

# By Position

get_structure(30, 70, 19)

#> List of 3
#>  $ age    : num 30
#>  $ weight : num 70
#>  $ amyloid: num 19

# By name (full name or partial-matching)

get_structure(weight = 70, 19, age = 30)

#> List of 3
#>  $ age    : num 30
#>  $ weight : num 70
#>  $ amyloid: num 19

# Raises error as a matches both age_years and amyloid_level

get_structure(30, 175, a = 19)
#> Error in get_structure(30, 175, a = 19) :
#>   argument 3 matches multiple formal arguments

Infix functions

Infix functions have two arguments with the function inbetween those arguments.

R has a many built-in infix operators including:

:, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, and <<-

You can also create your own infix functions that start and end with %. Base R uses this pattern to define %%, %*%, %/%, %in%, %o%, and %x%.

`%+%` <- function(a, b) paste(a, b)

"new" %+% "string"
#> [1] "new string"

Names of infix functions are more flexible as they can contain any sequence of characters except for %. Any special characters need to be escaped when you define the function and infix operators are composed from the left to the right.

`% %` <- function(a, b) paste(a, b)

"another" % % "new" % % "string"
#> [1] "another new string"

`%/\\%` <- function(a, b) paste(a, b)

"and" %/\% "one" %/\% "with" %/\% "special" %/\% "characters"
#> [1] "and one with special characters"

`%-%` <- function(a, b) paste0("(", a, " %-% ", b, ")")

"a" %-% "b" %-% "c"
#> [1] "((a %-% b) %-% c)"

Replacement functions

Act like they modify their arguments in place and have the special name xxx<- with arguments x and value. If additional arguments are needed, place them between x and value.

`second<-` <- function(x, value) {
  x[2] <- value
  x
}

x <- 1:5
second(x) <- 5L
x
#> [1] 1 5 3 4 5

`modify<-` <- function(x, position, value) {
  x[position] <- value
  x
}

modify(x, 1) <- 10
x
#> [1] 10 5 3 4 5

# R interprets modify as "x <- `modify<-`(x, 1, 10)".

Combining replacement with other functions requires more complex translations.

x <- c(a = 1, b = 2, c = 3)

names(x)
#> [1] "a" "b" "c"

names(x)[2] <- "two"
names(x)
#> [1] "a" "two" "c"

#' R interprets this as the following:
`*tmp*` <- x
x <- `names<-`(`*tmp*`, `[<-`(names(`*tmp*`), 2, "two"))
rm(`*tmp*`)
names(x)
#> [1] "a" "two" "c"

Special forms

There are many language features in R that are written in special ways but also have prefix forms.

Special Prefix
(x) `(`(x)
{x} `{`(x)
x[i] `[`(x, i)
x[[i]] `[[`(x, i)
if (cond) true `if`(cond, true)
if (cond) true else false `if`(cond, true, false)
for(var in seq) action `for`(var, seq, action)
while(cond) action `while`(cond, action)
repeat expr `repeat`(expr)
next `next`()
break `break`()
function(arg1, arg2) {body} `function`(alist(arg1, arg2), body, env)