Data Visualization

An Introduction to ggplot2

Ashlyn Barry

University of Wisconsin-Madison

July 8, 2026

What is ggplot2?

ggplot2 is a popular R package for creating data visualizations

“Grammar of Graphics” - build visualization by combining independent components

Key Strengths:

  • Versatile (scatterplots, boxplots, barcharts…)
  • Customizable: easily change formatting, fonts, colors…
  • Publication-quality graphics
  • Open source and free

7 components of creating a chart with ggplot

  • Essential components: data, mapping, and layers
  • Optional components: scales, facets, coordinates, and themes

Essential Components

Data: Specifies the dataset for the plot

ggplot(data = datasetname)

Mapping: set of instructions on how to “map” data

  • In other words, it defines which variables to include and how
  • Uses aes() to define aesthetics of plot
ggplot(data = datasetname, mapping = aes(x = height, y = weight))

Layers: defines how to display the mapped data

  • geom_(): defines geometry to determine how data is displayed (points, lines, etc.). More on this soon!
ggplot(
  data = datasetname,
  mapping = aes(x = height, y = weight) +
    geom_point()
)

Getting Started

Installation Requirements:

  • ggplot2 is a core package of the tidyverse
  • tidyverse must be installed and loaded
  • You only need to install the package once, but need to load it every time you start a session.
install.packages("tidyverse") # install package (once)
library(tidyverse) # load tidyverse (every new session)

💡 Tip: Update R and packages regularly for latest features

Palmer Penguin Package

Penguins dataset from the plamerpenguin package contains body measurements of penguins on three islands in the Palmer Archipelago, Antartica. We will use this dataset as an example of how to build figures using ggplot2.

library(palmerpenguins) # load palmer penguins dataset
penguins # load a preview of the data
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

You may need to run install.packages(“palmerpenguins”) if it wasn’t previously loaded on your system.

Creating a ggplot with penguins

What is the relationship between flipper length and body mass index of penguins?

Step 1: begin with the function ggplot(), telling it what data to use

Step 2: add mapping layer, which defines the aesthetics (layout) of your plot, including defining your x and y axes

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
)

Base plot

Geometrical object layer (geom_)

The geom_*() layer defines your plot type. We will create a scatterplot with penguins.

Plot type geom_
Bar chart geom_bar()
Line chart geom_line()
Boxplot geom_boxplot()
Scatterplot geom_point()
Violin plot geom_violin()
Histogram geom_histogram()
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point()

Plot data

Categorizing by species

Does the relationship between flipper length and body mass index differ by species?

names(penguins) # view variable names in dataset
[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"             

Modify aesthetics to categorize species by color.

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)
) +
  geom_point()

Categorize species by color

Adding lines of best fit

Next we will add smooth curves displaying the realtionship between body mass and flipper length by species.

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)
) +
  geom_point() +
  geom_smooth(method = "lm")

# geom_smooth(): geometric object representing data with a smooth line
# method = "lm": line of best fit based on a linear model

Linear models

Single line of best fit

What if we want a single line of best fit, but still want to categorize by species?

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(mapping = aes(color = species)) +
  geom_smooth(method = "lm")

# color = species in geom_point: only apply color to points
# geom_smooth(method = "lm") now applied to mapping = aes()

Single line of best fit

Categorizing by shape

People perceive colors differently, so it is generally not a good idea to categorize only by color. In addition to color, we can also categorize by shapes.

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(mapping = aes(color = species, shape = species)) +
  geom_smooth(method = "lm")

Categorizing by shape

Specify Titles and Axes

We likely do not want the variable name in our dataset to be the title of our axes. We can specify the “labels” of our plot using labs() in a new layer. We can also add a title and subtitle in this function.

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(aes(color = species, shape = species)) +
  geom_smooth(method = "lm") +
  labs(
    title = "Body mass and flipper length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length (mm)",
    y = "Body mass (g)",
    color = "Species",
    shape = "Species"
  )

Clean titles and axes

Setting plot as a function

Once you are satisfied with the content of your figure, you can save it as a function. This allows you to quickly call and reproduce the figure.

my.plot <- ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(aes(color = species, shape = species)) +
  geom_smooth(method = "lm") +
  labs(
    title = "Body mass and flipper length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length (mm)",
    y = "Body mass (g)",
    color = "Species",
    shape = "Species"
  )

my.plot

my.plot

Theme Component

Optional component: controls visuals of the plot that are not controlled by the data

  • Used to specify look and feel of your plot beyond ggplot default
  • Many options, from changing locations of titles and legends to changing the background color

Built-in themes: theme_*() functions

theme_*() Description
theme_grey() Default theme: light grey background with white gridlines
theme_bw() Black and white theme with white background and gridlines
theme_minimal() Minimalist: white background and grey gridlines
theme_classic() Classic R: white background with solid axes and no gridlines
theme_void() Empty theme: removes all backgrounds, axes, gridlines and labels
my.plot +
  theme_minimal()

theme_minimal()

Manually modifying theme

theme() allows you to manually adjust your theme based your preferences. For example, you can adjust the locations of the title, subtitle, and legend.

my.plot +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5), # hor. adjust title to center
    plot.subtitle = element_text(hjust = 0.5), # hor. adjust subtitle to center
    legend.position = c(0.85, 0.2) # x, y from bottom-left corner on scale 0-1
  )

Manually adjust text location

Remove gridlines

The function element_blank() removes an element entirely. Setting panel.grid.major and panel.grid.minor to element_blank() will remove them from your figure.

my.plot +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = c(0.85, 0.2),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )

Blank gridlines

Adding axes

What if you don’t want gridlines but still want a solid line to signify your x and y axes? You can tell ggplot to draw axes lines using axis.line = element_line()

my.plot +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = c(0.85, 0.2),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black")
  )

Adding axes

Theme as a function

Just like you set your graph as a function, you can also set your theme as a function. This will allow you to easily add the same theme to other graphs.

my.theme <- theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = c(0.85, 0.2),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(color = "black")
  )
my.plot +
  my.theme

Facets

Facets can be used to separate small multiples or different subsets of data based on one or more variables. This is a quick and powerful way to show patterns and trends within subsets of data.

facet_wrap(~ species) allows us to look at the relationship between body mass and flipper length for different species in separate figures.

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point() +
  facet_wrap(~species) +
  labs(
    title = "Penguin body mass by flipper length",
    subtitle = "Faceted by species",
    x = "Flipper length (mm)",
    y = "Body Mass (g)"
  )

Facets of body mass by slipper length by species

Facets with my.theme

Apply my.theme to your new figure!

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point() +
  facet_wrap(~species) +
  labs(
    title = "Penguin body mass by flipper length",
    subtitle = "Faceted by species",
    x = "Flipper length (mm)",
    y = "Body Mass (g)"
  ) +
  my.theme

Facets with my.theme

Extensions: ggpubr

  • ggpubr is a package that creates publication ready plots
  • Requires less manual formatting, but is still flexible to preferences
# Install and load ggpubr package
install.packages("ggpubr")
library(ggpubr)

# Density plot
ggdensity(
  penguins,
  x = "body_mass_g",
  add = "mean",
  color = "species",
  fill = "species"
)

# Boxplot
ggboxplot(
  penguins,
  x = "flipper_length_mm",
  y = "body_mass_g",
  add = "jitter",
  color = "species"
)

# Lollipop chart
ggdotchart(
  penguins,
  x = "flipper_length_mm",
  y = "body_mass_g",
  color = "species",
  sorting = "ascending", # sorts data in ascending order
  add = "segments", # draws lines from y=0 to data point
  ggtheme = theme_pubr()
)

ggpubr example plots

Complete description and additional plots: rpkgs.datanovia.com/ggpubr/

ggplot2 Hands-On Practice

Work through the activity for your level — and continue to the next if you finish early!

Open visualization_activities.qmd in RStudio to get started

Level Group Activity
🟢 Beginner Activity 1 Fill in the blanks
🟡 Intermediate Activity 2 Update the facet plots
🔴 Advanced Activity 3 Create figure using new dataset

Finished early? Try the bonus question or move to the next level!