ggplot2
- ILecture 1
University of Arizona
INFO 526 - Fall 2023
Reading Quiz #1 is due Monday, by 3:30pm.
A note on readings for next week: Some of it is review so feel free to skim those parts.
In this lecture, we will:
Explore the grammar of graphics
Map data to aesthetics
Understand layer components
Interpret ggplot2
documentation
Create a layered plot
Introduce function and syntax of visual elements
“The fundamental principles or rules of an art or science” - Oxford English Dictionary
Reveal composition of complicated graphics
Strong foundation for understanding a range of graphics
Guide for well-formed or correct graphics
Note
See “The Grammar of Graphics” by Leland Wilkinson (2005) and “A Layered Grammar of Graphics” by Hadley Wickham (2010)
ggplot2
builds complex plots iteratively, one layer at a time.
What are the necessary components of a plot?
What are necessary components of a layer?
A plot contains:
Data and aesthetic mapping
Layer(s) containing geometric object(s) and statistical transformation(s)
Scales
Coordinate system
(Optional) facets or themes
A layer contains:
Data with aesthetic mapping
A statistical transformation, or stat
A geometric object, or geom
A position adjustment
Data can be added to either the entire ggplot object or a particular layer.
Input data must be a dataframe in ‘tidy’ format:
every column is a variable
every row is an observation
every cell is a single value
Note
See “Tidy Data” by Wickham (2014) and the associated vignette
# A tibble: 6 × 4
species bill_length_mm bill_depth_mm body_mass_g
<fct> <dbl> <dbl> <int>
1 Adelie 39.1 18.7 3750
2 Adelie 39.5 17.4 3800
3 Gentoo 46.7 15.3 5200
4 Gentoo 43.3 13.4 4400
5 Chinstrap 46.1 18.2 3250
6 Chinstrap 51.3 18.2 3750
# A tibble: 6 × 4
Color x y Size
<fct> <dbl> <dbl> <int>
1 Adelie 39.1 18.7 3750
2 Adelie 39.5 17.4 3800
3 Gentoo 46.7 15.3 5200
4 Gentoo 43.3 13.4 4400
5 Chinstrap 46.1 18.2 3250
6 Chinstrap 51.3 18.2 3750
Warning: Removed 2 rows containing missing values (`geom_point()`).
Can be supplied to initial ggplot()
call, in individual layers, or a combo
ggplot()
data and aesthetics are inherited, but can be overridden
Can be supplied to initial ggplot()
call, in individual layers, or a combo
ggplot()
data and aesthetics are inherited, but can be overridden
ggplot(penguins, aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
ggplot(penguins, aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
ggplot(penguins, aes(x = body_mass_g,
y = flipper_length_mm)) +
geom_point(aes(color = species)) +
geom_smooth(method = "lm",
se = FALSE)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
Specifying a constant inside aes()
with quotes creates a legend on the fly
Warning: Removed 2 rows containing missing values (`geom_point()`).
Removed 2 rows containing missing values (`geom_point()`).
layer()
A layer contains:
Data with aesthetic mapping
A statistical transformation, or stat
A geometric object, or geom
A position adjustment
Note
All geom_*()
or stat_*()
calls are customized shortcuts for the layer()
function.
Defining each of the components of a layer or whole graphic can be tiresome
ggplot2
has a hierarchy of defaults
So you can make a graph in 2 lines of code!
Warning: Removed 2 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
stat_*
vs. geom_*
“Every geom has a default statistic, and every statistic has a default geom.” - Wickham (2010)
stat_*
transforms the data
geom_*
control the type of plot renderedTip
When in doubt, check the documentation
stat_count()
and geom_bar()
are equivalent
stat_density()
and geom_density()
are not equivalent
In general, use geom_*()
unless you are trying to:
Track all geom and stat options
Exercise
For each of the following problems, suggest a useful geom:
For example, boxplots and errorbars can’t be stacked.
Exercise
What properties must a geom possess to be stackable?
What properties must a geom possess to be dodgeable?
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ lubridate 1.9.2 ✔ tibble 3.2.1
✔ purrr 1.0.1 ✔ tidyr 1.3.0
✔ readr 2.1.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Exercise
What are the two layers in this plot? What data when into each?
Each scale is a function that translate data space (in data units) into aesthetic space (e.g., pixels)
A guide (axis or legend) is the inverse function, that converts visual properties back to data
Each scale is a function that translate data space (in data units) into aesthetic space (e.g., pixels)
A guide (axis or legend) is the inverse function, that converts visual properties back to data
Every aesthetic in a plot is associated with exactly one scale.
Scale functions names are made of 3 pieces separated by “_”:
scale
the name of the primary aesthetic (color
, shape
, x
)
the name of the scale (discrete
, continuous
, brewer
)
Coordinate systems have 2 primary roles:
Combine the x
and y
position aesthetics to produce a 2-dimensional position on the plot
In coordination with faceting (optional), draw axes and panel backgrounds
Linear:
coord_cartesian()
: common default
coord_flip()
: x and y axes flipped
coord_fixed()
: fixed aspect ratio
Non-linear:
coord_map()
/coord_quickmap()
/coord_sf()
: map projections, x
and y
become longitude and latitude
coord_polar()
: polar coordinates, x
and y
become angle and radius
coord_trans()
: apply transformations
Creates small multiples to show different subsets:
facet_null()
: default
facet_wrap()
: “wraps” a 1d ribbon of panels into 2d
facet_grid()
: 2d grid of panels defined by row and column
Exercise
Recreate the figure below. How would you get the gray points to show up on all facets?
Warning: Removed 6 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).
Controls non-data elements of plots (e.g., to match a style guide).
Theme elements specify the non-data elements you can control: plot.title
, legend.position
Each element has an element function to describe its visual properties: element_text()
, element_blank()
The theme()
function allows overriding of the default theme: theme(legend.title = element_blank())
Penguin artwork by @allison_horst
Hadley Wickham’s “A layered grammar of graphics” (2010)
Hadley Wickham’s “ggplot2: Elegant Graphics for Data Analysis, 3rd edition”, now available online
“R for Data Science”, by Hadley Wickham, Mine Cetinkaya-Rundel, & Garret Grolemund, especially chapters 2, 10, and 12