<- readRDS(file = "data/all_workshops.RDS")
all_workshops <- readRDS(file = "data/workshop_breakdown.RDS") workshop_breakdown
Introduction
This week we will learn about ggplot2
- a tidyverse package for visualizing data. It is a powerful and flexible tool that allows you to create fully customizable, publication quality graphics. The gg in ggplot2
stands for grammar of graphics. The grammar of graphics is the underlying philosophy of the package. It focuses on creating graphics in layers. Start with the data – map the data the axes and to aesthetic qualities like size, shape, and color and geometries like dots, lines, and polygons. Further refine the appearance of your plot by adjusting scales and legends, labels, coordinate systems, and adding annotations.
The Data
The data we’re working with this week is familiar. We’ll visualize the workshops data frames we created in Week 1.
Because we saved that data as .RDS files, we can open them up and load them to our environment.
We’ll complete the following tasks:
- Create a basic bar chart comparing workshop attendance among the various schools.
- Create a grouped bar chart comparing workshop attendance among the schools and university roles.
- Add titles and labels to our graph
- Adjust the color palette
- Choose a theme
Let’s open a new script and load the libraries we’ll be working with in this lesson:
library(tidyverse)
ggplot2
Basics
All ggplot2
graphs start with the same basic template:
<DATA> %>%
ggplot(aes(<MAPPINGS>)) +
<GEOM_FUNCTION>() +
<Additional GEOMS, SCALES, THEMES, etc. . . >
All graphs start with the ggplot function and the data. We’ll use the pipe to pipe the data to the function.
%>%
all_workshops ggplot()
We see that even this initializes the plot area of RStudio.
Building a basic bar chart
Next, we define a mapping (using the aesthetic, or aes()
, function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x/y positions or characteristics such as size, shape, color, etc. Here we will say that the x axis should contain the affiliation variable. Note how the x-axis populates with some numbers and tick marks. We do not need to specify a y variable, since it will look at the count by default.
%>%
all_workshops ggplot(mapping=aes(x=affiliation))
Next we need to add ‘geoms’
– graphical representations of the data in the plot (points, lines, bars). ggplot2
offers many different geoms for common graph types. To add a geom to the plot use the +
operator.
%>%
all_workshops ggplot(mapping=aes(x=affiliation)) +
geom_bar()
By default bar graphs will display the count of the x variable, but it is also possible to specify a y variable that contains the count, as we do in our summarized dataset workshop_breakdown
.
In this case, we would want to be able to specify affiliation
as the x variable and n
as the y variable.
%>%
workshop_breakdown ggplot(mapping = aes(x=affiliation, y=n)) +
geom_bar(stat="identity")
You need to make one other adjustment, and change the stat argument from it’s default of “count” to “identity” This tells it to base the y axis on the specified variable.
Setting vs mapping aesthetics
When working with ggplot2
, it’s important to understand the difference between setting aesthetic properties and mapping them. All geoms have certain visual attributes that can be modified. Polygons like bars, have the properties fill and color. You can change the inside color of a bar with fill
, and the border with color
. We can modify the defaults with the fill
and color
arguments in the geom_bar()
layer. (I’ve also increased the linewidth
to make it easier to see the border color)
%>%
all_workshops ggplot(mapping=aes(x=affiliation)) +
geom_bar(fill="blue", color="purple", linewidth=1.5)
How did we know the color names “blue” and “purple” would work in the code above? R has 657 (!!) built in color names. You can see them by calling the function colors()
. You can also specify colors using rgb and hexadecimal codes.
Now we have manually set a value for the fill and color. To create our initial graph, we used the mapping
argument and the aes()
function to map the x axis to the affiliation variable. Watch what happens if we map the fill property to the affiliation variable as well.
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=affiliation)) +
geom_bar()
Building a grouped bar chart
Mapping a variable to an aesthetic is especially useful when we have third variable we want to express on our graph. For example, what if we want to compare attendance by both affiliation and status? To do this we can create a grouped bar chart by mapping fill
to the status
variable.
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar()
When you map an aesthetic with aes()
in the ggplot()
function it is inherited by all subsequent layers. When you map in a geom_*()
function it is applied only to that layer.
By default this creates a stacked bar chart. To make it grouped, we add a position="dodged"
argument to geom_bar()
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar(position = "dodge")
To make our graph look less crowded, we can also modify the width of the bars. Note this is done outside of the aes()
function.
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar(position = "dodge", width = 0.75)
Adding titles and axis labels
A good plot communicates clearly. So far we’ve talked about achieving this through the aesthetic aspects of your plot, but it’s important to make sure your graph has a clear title and axis labels as well. There are a few ways to do this in ggplot2
, but one of the simplest is to use the labs()
function.
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar(position = "dodge", width = 0.75) +
labs(title = "Who is attending library workshops?",
subtitle = "Attendance by School and Role, 2018-2023",
x="School Affiliation",
y= "Number of Attendees",
fill="University Role")
Working with color palettes
There are many options for changing the color palette of your plot. You can set your palette manually:
<- c("#C8102E", "#FFCD00", "#2C2A29") #Official UMB colors
myPalette
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar(position = "dodge", width =0.75) +
labs(title = "Who is attending library workshops?",
subtitle = "Attendance by School and Role, 2018-2023",
x="School Affiliation", y= "Number of Attendees",
fill="University Role") +
scale_fill_manual(values = myPalette)
Generally, it may be preferable to work with one of the built in ggplot2
or R palettes, or to install one of several packages with additional palettes such as:
Let’s try applying a viridis
palette. viridis
was designed to be especially robust for many forms of color-blindness. It is also meant to print well in grey scale. As an additional advantage, a lightweight form of the package is included with ggplot2
, so there is no need to install additional packages.
%>%
all_workshops ggplot(mapping=aes(x=affiliation, fill=status)) +
geom_bar(position = "dodge", width = 0.75) +
labs(title = "Who is attending library workshops?",
subtitle = "Attendance by School and Role, 2018-2023",
x="School Affiliation",
y= "Number of Attendees",
fill="University Role") +
scale_fill_viridis_d()
Learn more from the viridis
documentation
Changing the theme
The theme of a ggplot2
graph controls the overall look and all non-data elements of the plot. There are several built-in themes which can be applied as another layer. Start typing theme_
in RStudio to see a list of themes. You can also use the theme()
function to modify aspects of an existing theme. Here we apply theme_classic()
which removes the grid lines and grey background of the default theme.
%>%
all_workshops ggplot(mapping=aes(x=fct_infreq(affiliation), fill=status)) +
geom_bar(position = "dodge", width = 0.75) +
labs(title = "Who is attending library workshops?",
subtitle = "Attendance by School and Role, 2018-2023",
x="School Affiliation",
y= "Number of Attendees",
fill="University Role") +
scale_fill_viridis_d() +
theme_classic()
Wrapping Up
Once your plot looks the way you want, you may want to export it to an image file for use in another document (although in Week 6 we’ll learn how to build reports with plots directly in RStudio).
There are two ways to export your plot:
- Use the Export widget in the Plots pane of RStudio
- Use the
ggsave()
function
Let’s first save the grouped bar chart as an object:
<- all_workshops %>%
workshop_attendance_bar ggplot(mapping=aes(x=fct_infreq(affiliation), fill=status)) +
geom_bar(position = "dodge", width = 0.75) +
labs(title = "Who is attending library workshops?",
subtitle = "Attendance by School and Role, 2018-2023",
x="School Affiliation",
y= "Number of Attendees",
fill="University Role") +
scale_fill_viridis_d() +
theme_classic()
Now use ggsave()
to save a .jpg version of your plot to your figs/
directory. The only required argument is the file path where you want to save the image. By default it will save the last plot generated, but you can also supply the object name of the plot you want to save.
ggsave(filename = "figs/workshop_attendance_bar.jpg", workshop_attendance_bar)
Saving 7 x 5 in image
Note that you can also use this function to adjust the size and resolution of you graph.
ggsave(filename = "figs/workshop_attendance_bar.jpg", workshop_attendance_bar, width = 7, height = 5, dpi = 300)