Object-Oriented Programming (OOP) focuses on objects (like data frames or models). The same function can work differently depending on the object type — this is called polymorphism. It means you can use one function name for many kinds of input, and R figures out the right behavior.
Polymorphism
Polymorphism is what allows summary to produce different outputs for numeric and factor variables.
class(ggplot2::diamonds$carat)#> [1] numericsummary(ggplot2::diamonds$carat)#> Min. 1st Qu. Median Mean 3rd Qu. Max.#> 0.2000 0.4000 0.7000 0.7979 1.0400 5.0100class(ggplot2::diamonds$cut)#> [1] "ordered" "factor"summary(ggplot2::diamonds$cut)#> Fair Good Very Good Premium Ideal#> 1610 4906 12082 13791 21551
An OOP system makes it possible for any developer to extend the interface with by adding implementations for new types of input. In OOP, the type of an object is a class and its implementation is the method. There are two main paradigms for OOP which differ in how methods and classes are related:
Encapsulated OOP: Methods belong to objects or classes (e.g., object.method(arg1, arg2))
Base R has three OOP systems including S3, S4, and RC. A number of other OOP systems exist from CRAN packages including R6, R.oo, and proto.
The sloop package provides tools to help you interactively explore and understand object oriented programming in R. For example, sloop::otype() makes it easy to find the type of OOP system being used.
While everything in R is an object, not everything is object-oriented. Base objects in R come from S, and were developed before anyone thought that S needed an OOP system.
Base and object-oriented objects can be identified using sloop::otype. The main difference between base and object-oriented objects is the “class” attribte.
The generic function defines the interface and finds the right method for the class using method dispatch. You can use sloop::s3_dispatch to see the process of method dispatch.
The naming scheme of S3 methods are always generic.class(), but you should never call the method directly and instead rely on the generic function to find it for you.
You can use sloop::s3_get_method to see the source code of generic functions which are often not exported in R.
S3 has no formal definition of a class. Instead, to make an object an instance of a class, you set the class attribute with structure or class<-(). You can view a class with class() and check to see if an object is an instance of a class with inherits(x, “classname”).
# Create and assign class in a single stepx <-structure(list(), class ="myclass")# Create and then assign classx <-list()class(x) <-"myclass"class(x)#> [1] "myclass"inherits(x, "myclass")#> [1] TRUE
Making Your Own Classes
It is helpful to provide a constructor, validator, and helper function when creating your own classes. This makes it easy for others to add objects of your class.
Constructor: Efficiently creates new objects with the correct structure (e.g., new_myclass())
Validator: Performs more computationally expensive checks to ensure the object has correct values (e.g., validate_myclass())
Helper: Provides a convenient way for others to create objects of your class (e.g., myclass())
Constructors
There is no built-in way to ensure that all objects of a class have the same structure (e.g., same base type and attributes). A constructor can help enforce a consistent structure and should follow three principles:
Be called new_myclass()
Have one argument for the base object and one for each attribute
Check the type of base object and the types of each attribute
More complicated classes require additional checks for validity. A constructor only checks that types are correct, making it possible to create malformed objects.
new_factor <-function(x =integer(), levels =character()) {stopifnot(is.integer(x))stopifnot(is.character(levels))structure(x, levels = levels, class ="factor")}new_factor(1:5, "a")#> Error in `as.character.factor()`: malformed factor
It is better to add checks to a validator rather than including them in the constructor. This allows you to create new objects quickly when you know the values are correct, and re-use the validation checks in other places.
validate_factor <-function(x) { values <-unclass(x) levels <-attr(x, "levels")if (!all(!is.na(values) & values >0)) {stop("All `x` values must be non-missing and greater than zero",call. =FALSE ) }if (length(levels) <max(values)) {stop("There must be at least as many `levels` as possible values in `x`",call. =FALSE ) } x}validate_factor(new_factor(1:5, "a"))#> Error: There must be at least as many `levels` as possible values in `x`validate_factor(new_factor(0:1, "a"))#> Error: All `x` values must be non-missing and greater than zero
The validator function is primarily called for its side-effects (i.e., throwing an error if the object is not valid). It is useful for validation methods to visibly return the original input.
Helpers
Helpers can make constructing objects from your class simple if it always:
Has the same name as the class (e.g., myclass())
Finish by calling the constructor and validator
Create error messages that are helpful to the end user
Have a thoughtfully crafted user interface with carefully chosen default values and conversions
At times, the helper only needs to coerce its inputs to the desired type. For example, the new_difftime constructor below is strict and violates the usual convention that an integer vector can be substituted for a double. A helper function can be created to coerce the input.
# Constructornew_difftime <-function(x =double(), units ="secs") {stopifnot(is.double(x)) units <-match.arg(units, c("secs", "mins", "hours", "days", "weeks"))structure(x, class ="difftime", units = units)}new_difftime(1:10)#> Error in `new_difftime()`: is.double(x) is not TRUE# Helperdifftime <-function(x =double(), units ="secs") { x <-as.double(x)new_difftime(x, units = units)}difftime(1:10)#> Time differences in secs#> [1] 1 2 3 4 5 6 7 8 9 10
Complex objects are often easiest to represent as strings. For example, you can create factors from a character vector, and a helper function can set the levels from the unique values.
factor <-function(x =character(), levels =unique(x)) { ind <-match(x, levels)validate_factor(new_factor(ind, levels))}factor(c("a", "a", "b"))#> [1] a a b#> Levels: a b
Some complex objects are best built from simple parts. For example, a datetime is naturally created from year, month, and day, which makes construction easier for users.
An S3 generic chooses the right method for an object’s class (method dispatch). This is done by UseMethod(), which takes the generic’s name and, optionally, the argument to dispatch on. If the second argument is left out (the usual case), it dispatches based on the first argument. Most S3 generics are nothing more than a call to UseMethod().
mean#> function (x, ...)#> UseMethod("mean")#> <bytecode: 0x12f6fb108>#> <environment: namespace:base># Creating your own S3 generic is simplemyNewGeneric <-function(x) {UseMethod("myNewGeneric")}
Method Dispatch
useMethod creates a vector of method names and looks for each potential method. We can see this with sloop::s3_dispatch.
x <-Sys.Date()sloop::s3_dispatch(print(x))#> => print.Date#> * print.default
=> indicates the method that is called
* indicates a method that is defined, but not called
The default class is a special fallback, not a real class, that provides a method when no other match is found. While basic method dispatch is simple, it becomes more complex when inheritance, base types, internal generics, and group generics are involved.
x <-matrix(1:10, nrow =2)sloop::s3_dispatch(mean(x))#> mean.matrix#> mean.integer#> mean.numeric#> => mean.defaultsloop::s3_dispatch(sum(x))#> sum.matrix#> sum.integer#> sum.numeric#> sum.default#> Summary.matrix#> Summary.integer#> Summary.numeric#> Summary.default#> => sum (internal)
Finding Methods
We can use sloop::s3_dispatch to find the specific method for a single call. Finding all the possible methods defined for a generic or associated with a class can be done with sloop::s3_methods_generic and sloop::s3_methods_class.
sloop::s3_methods_generic("mean")#> 1 mean Date TRUE base#> 2 mean default TRUE base#> 3 mean difftime TRUE base#> 4 mean POSIXct TRUE base#> 5 mean POSIXlt TRUE base#> 6 mean quosure FALSE registered S3method#> 7 mean vctrs_vctr FALSE registered S3methodsloop::s3_methods_class("ordered")#> 1 as.data.frame ordered TRUE base#> 2 Ops ordered TRUE base#> 3 relevel ordered FALSE registered S3method#> 4 Summary ordered TRUE base
Creating Methods
When writing a new method, watch out for two common pitfalls:
Ownership: Only define a method if you own the generic or the class. While R lets you define methods on anything, it’s best practice to work with the original author to avoid conflicts.
Arguments: A method must use the same arguments as its generic. The only exception is when the generic uses …, in which case the method can include extra arguments.
Object Styles
Record style objects (e.g., datetime), data frames, and scalar objects (e.g., linear model) are examples of generics that length(x) does not equal the number of observations.
Record style objects use a list of equal-length vectors to represent individual components of the object. The best example is POSIXlt which is a list of 11 date-time components.
Ordered is a subclass of factor because it appears before factor in the class vector; likewise, factor is a superclass of ordered. S3 doesn’t enforce rules on subclasses and superclasses, but it’s good practice to keep base types and attributes consistent.
NextMethod
NextMethod tells R to keep looking for the next method up the class hierarchy and run it.
We can create an example with [, the most common use case, by adding a secret class that hides its output when printed.
# Add a new constructornew_secret <-function(x =double()) {stopifnot(is.double(x))structure(x, class ="secret")}# Add a new class to the print S3 genericprint.secret <-function(x, ...) {print(strrep("x", nchar(x)))invisible(x)}x <-new_secret(c(15, 1, 456))#> [1] "xx" "x" "xxx"sloop::s3_dispatch(x[1])#> [.secret#> [.default#> => [ (internal)x[1]#> [1] 15
This works, but the default [ method does not preserve the class and hide the output. Providing a [.secret method would solve this problem. However, the naive approach would cause an infinite loop.
# Naive approach - infinite loop`[.secret`<-function(x, i) {new_secret(x[i])}# Inefficient way because it creates a copy of x`[.secret`<-function(x, i) { x <-unclass(x)new_secret(x[i])}x[1]#> [1] "xx"# Best way using NextMethod()`[.secret`<-function(x, i) {new_secret(NextMethod())}x[1]#> [1] "xx"# Shows [.secret is called by work is delegated to the internal methodsloop::s3_dispatch(x[1])#> => [.secret#> [.default#> -> [ (internal)
Allowing Subclassing
If you allow subclasses when creating a class, the parent constructor needs to have ... and class arguments. The subclass constructor can just call to the parent class constructor with additional arguments as needed.
new_secret <-function(x, ..., class =character()) {stopifnot(is.double(x))structure(x, ..., class =c(class, "secret"))}# Create a new supersecret class that hides the number of charactersnew_supersecret <-function(x) {new_secret(x, class ="supersecret")}print.supersecret <-function(x, ...) {print(rep("xxxxx", length(x)))invisible(x)}x2 <-new_supersecret(c(15, 1, 456))x2#> [1] "xxxxx" "xxxxx" "xxxxx"
Using the constructor inside methods breaks inheritance (i.e., the result always has the same class, even for subclasses). The vctrs::vec_restore() function restores the original class after operations like subsetting.
If you build your class with vctrs, this behavior comes automatically; you only need a custom [ method for special cases. Explore this package for tools that help create and work with vector-like objects in R
R6 Basics
The R6 OOP system has 2 special propoerties:
It uses the encapsulated OOP paradigm, meaning that methods belong to objects (not generics) and are called with object$method()
Objects are mutable which means they have reference semantics and are modified in place
Classes and Methods
R6 functions have two important arguments:
classname: Improves error messages and makes it possible to use R6 objects with S3 generics
public: Provides a list of methods (functions) and fields (anything other than functions) that make up the public interface of the object.
Use UpperCamelCase for R6 classes and snake_case for methods and fields. Always assign the result of R6Class to a variable with the same name as the class. Access to the methods and fields of the current object is done via self$
You can construct a new object from the class by calling the new() method and then call methods and access fields with $.
x <- Accumulator$new()# Access method and add 4 to a sum of 0x$add(4)# Access the field to get the sumx$sum#> [1] 4
Add is primarily called for its side-effect of updating sum. Side-effect methods should always return self invisibly. This returns the current object and makes method chaining possible.
x$add(10)$add(10)$sum#> [1] 24
Important Methods
The $initialize() and $print() methods should be defined for most classes as they make your class easier to use. The initialize() method overrides the default behavior of \(new()** and **\)print() overrides the default printing behavior.
Person <- R6::R6Class(classname ="Person",public =list(name =NULL,age =NA,initialize =function(name, age =NA) { self$name <- name self$age <- age },print =function(...) {cat("Person: \n")cat(" Name: ", self$name, "\n", sep ="")cat(" Age: ", self$age, "\n", sep ="")invisible(self) } ))Brian <- Person$new("Person", age =36)Brian#> Person:#> Name: Person#> Age: 36
Adding Methods After Creation
The fields and methods of an R6 class can be modified after creation. This is useful when exploring interactively or breaking up a class with many functions into smaller pieces. You can use $set() to add new elements to an existing class.
Behavior from an existing class can be used by providing the class object to the inherit argument. Below, $add() overrides the superclass implementation; however, using super$ will delegate to the superclass implementation (like using NextMethod).
Every R6 object has an S3 class that reflects its hierarchy of R6 classes. It includes the base R6 class which elicits common behavior like print.R6. You can list all methods and fields of an R6 with names.
R6Class has two other arguments that work similarly to public:
private: Allows you to create fields and methods that are only available from within the class
active: Allows you to use accessor functions to define dynamic (i.e., active) fields
Private
The private argument to R6Class works the same way as the public argument. It accepts a named list of methods (functions) and fields. Fields and methods are available within the class by using private$ instead of self$.
Person <- R6::R6Class(classname ="Person",public =list(initialize =function(name, age =NA) { private$name <- name private$age <- age },print =function(...) {cat("Person: \n")cat(" Name: ", private$name, "\n", sep ="")cat(" Age: ", private$age, "\n", sep ="") } ),private =list(age =NA,name =NULL ))Brian <- Person$new("Brian", age =36)Brian#> Person:#> Name: Brian#> Age: 36Brian$name#> NULL
Active
Active fields allow you to define components that look like fields, but are defined with functions. They are implemented using active bindings and each active binding is a function that takes a single argument (i.e., value). If the argument is missing(), the value is being retrieved, otherwise it is being modified.
Active fields are particularly useful with private fields as it allows you to implement components that look like fields from the outside but provide additional checks. For example, we can create a read-only age field and to ensure that name is a character vector of length one.
Person <- R6::R6Class(classname ="Person",private =list(.age =NA,.name =NULL ),active =list(age =function(value) {if (missing(value)) { private$.age } else {stop("`$age` is read only", call. =FALSE) } },name =function(value) {if (missing(value)) { private$.name } else {stopifnot(is.character(value), length(value) ==1) private$.name <- value self } } ),public =list(initialize =function(name, age =NA) { private$.name <- name private$.age <- age } ))Brian <- Person$new("Brian", age =36)Brian$name#> [1] "Brian"Brian$name <-10#> Error: is.character(value) is not TRUEBrian$age <-35#> Error: `$age` is read only
Reference Semantics
A difference between R6 and other objects is that they have reference semantics. This means that objects are not copied when modified. Instead, if you want a copy, you need to use $clone(). If you want recursive cloning of nested R6 objects, you will need to use $clone(deep = TRUE).
There are some less obvious consequences of reference semantics:
The $finalize() method should be used to clean up the resources created by the initializer
An R6 class as the default value of a field is shared across all instances of the object
Since R6 objects are not copied-on-modify, they are only deleted one time. We can use $finalize() as a complement to $initialize() as they play a similar role to on.exit in a function and clean up any resources created by the initializer.
If you use an R6 class as the default value of a field, it will be shared across all instances of the object. For example, we want to create a temporary database everytime we call TemporaryDatabase$new(), but the code below always uses the same path.
In the example above, TemporaryFile$new() is only called once when TemporaryDatabase is defined. We can move the call to $initialize() to create a new file each time
S4 provides a more formal approach to functional OOP. The underlying ideas are like S3, but it has stricter implementation and makes use of specialized functions for creating classes (setClass()), generics (setGeneric()), and methods (setMethod()). S4 also provides multiple inheritance (i.e., a class can have multiple parents, section 15.5.2) and dispatch (i.e., method dispatch can use the class of multiple arguments, section 15.5.3).
Defining Classes and Setting Generics
You can define an S4 class using the methods package by calling setClass with the class name and a definition for its slots, a named component of the object that is accessed using the @ operator. Once the class is defined, you can use new to construct new objects.
#' The methods package is always available when running R interactively#' but using library(methods) can let users know you are using S4library(methods)setClass("Person",slots =c(name ="character",age ="numeric" ))Brian <-new("Person", name ="Brian Helsel", age =36)# Check class with "is"is(Brian)#> [1] "Person"# Access slots with "@" or "slot"# @ is equivalent to $# slot is equivalent to [[Brian@name#> [1] "Brian Helsel"slot(Brian, "age")#> [1] 36
Accessor functions should be used to allow you to safely set and get slot values.
When calling setClass, you are registering a class definition in a hidden global variable. Thus, the object is defined and constructed at the same time. Careful implementation of state-modifying functions is important as it is possible to create invalid objects (e.g., when redefining a class).
Helper
The new() function is a low-level constructor suitable for use by the developer, but user-facing classes should be paired with a helper which always:
Has the same name as the class
Uses a thoughtfully crafted user interface with carefully chosen defaults and conversions
Finishes by calling methods::new()
Person <-function(name, age =NA) { age <-as.double(age)new("Person", name = name, age = age)}str(Person("Brian"))#> Formal class 'Person' [package ".GlobalEnv"] with 2 slots#> ..@ name: chr "Brian"#> ..@ age : num NA
Validator
The constructor automatically checks that the slots have the correct classes, but you will need to implement more complicated checks. We can do this with the setValidity function.
setValidity(Class ="Person", method =function(object) {if (length(object@name) !=length(object@age)) {"@name and @age must be the same length" } else {TRUE }})Person("Brian", age =c(36, 37))#> Error in `validObject()`:#> invalid class “Person” object: @name and @age must be the same length# Check the validity with validObjectvalidObject(Person("Brian", age =36))#> [1] TRUE
Generics and Methods
The role of a generic is to perform method dispatch (i.e., find the right implementation for the defined classes). We can use setGeneric with a function that calls standardGeneric to create a new S4 generic.
It is best practice to use lowerCamelCase with S4 generics and avoid using {} in the function as it triggers a special case the is more computationally expensive.
The signature argument allows you to control the arguments that are used for method dispatch. If signature is not provided, all arguments except for ... are used. At times, it is useful to remove arguments from dispatch which allows you to require methods (e.g., verbose = TRUE), but ensure that they are not involved in dispatch.
We can add methods with setMethod which takes three important arguments: the name of the generic, the name of the class, and the method itself. The second argument to setMethod is the signature and can include multiple arguments
setMethod("myGeneric", "Person", function(x) {# Add your method implementation here...})
You can list all the methods that belong to a generic, i.e., methods("generic"), are associated with a class, i.e., methods(class = “class”), and find the implementation of a specific method, i.e., selectMethod(“generic”, “class”).
Show Method
The show method is the most commonly defined S4 method that controls how the object appears when it is printed. To define a method for an existing generic, first retrieve all its arguments with the args function.
Slots should be considered an internal implementation detail and all user-accessible slots should be acccompanied by a pair of accessors.
setGeneric("name", function(x) standardGeneric("name"))setMethod("name", "Person", function(x) x@name)name(Brian)#> [1] Brian Helsel
If the slot is writeable, you should provide a setter function and always include validObject() to prevent the user from creating invalid objects.
setGeneric("name<-", function(x, value) standardGeneric("name<-"))setMethod("name<-", "Person", function(x, value) { x@name <- valuevalidObject(x) x})name(Brian) <-"Brian C. Helsel"name(Brian)#> [1] "Brian C. Helsel"name(Brian) <- letters#> Error in `validObject()`:# invalid class “Person” object: @name and @age must be the same length
S4 and S3 Classes and Generics
When writing S4 code, you will often need to interact with existing S3 classes and generics. When using setClass, you can include S4 classes, S3 classes, or the implicit class of base type. To use an S3 class, you must first register it with setOldClass, but it is usually better to provide a full S4 definition with slots and a prototype.
If an S4 object inherits from an S3 class or a base type, it will have a special virtual slot called .Data containing the underlying base type or S3 object.
RangedNumeric <-setClass("RangedNumeric",contains ="numeric",slots =c(min ="numeric", max ="numeric"),prototype =structure(numeric(), min =NA_real_, max =NA_real_))rn <-RangedNumeric(1:10, min =1, max =10)str(rn)#> Formal class 'RangedNumeric' [package ".GlobalEnv"] with 3 slots#> ..@ .Data: int [1:10] 1 2 3 4 5 6 7 8 9 10#> ..@ min : num 1#> ..@ max : num 10
It is also possible to convert an existing S3 generic to an S4 generic.
selectMethod("mean", "ANY")#> Error in `getGeneric()`: no generic function found for ‘mean’setGeneric("mean")selectMethod("mean", "ANY")#> Method Definition (Class "derivedDefaultMethod"):#> function (x, ...)#> UseMethod("mean")#> <bytecode: 0x11e0ad228>#> <environment: namespace:base>#> Signatures:#> x#> target "ANY"#> defined "ANY"
Trade-Offs
Read Advanced R Chapter 16 for a full description of the trade-offs between S3, S4, and R6.