R Community of Practice Week 3

Data Viz with ggplot2

Introduction

This week we will learn about ggplot2 - a tidyverse package for visualizing data. It is a powerful and flexible tool that allows you to create fully customizable, publication quality graphics. The gg in ggplot2 stands for grammar of graphics. The grammar of graphics is the underlying philosophy of the package. It focuses on creating graphics in layers. Start with the data – map the data the axes and to aesthetic qualities like size, shape, and color and geometries like dots, lines, and polygons. Further refine the appearance of your plot by adjusting scales and legends, labels, coordinate systems, and adding annotations.

The Data

The data we’re working with this week is familiar. We’ll visualize the workshops data frames we created in Week 1.

Because we saved that data as .RDS files, we can open them up and load them to our environment.

all_workshops <- readRDS(file = "data/all_workshops.RDS")
workshop_breakdown <- readRDS(file = "data/workshop_breakdown.RDS")

We’ll complete the following tasks:

  1. Create a basic bar chart comparing workshop attendance among the various schools.
  2. Create a grouped bar chart comparing workshop attendance among the schools and university roles.
  3. Add titles and labels to our graph
  4. Adjust the color palette
  5. Choose a theme

Let’s open a new script and load the libraries we’ll be working with in this lesson:

library(tidyverse)

ggplot2 Basics

All ggplot2 graphs start with the same basic template:

<DATA> %>%
    ggplot(aes(<MAPPINGS>)) +
    <GEOM_FUNCTION>() +
    <Additional GEOMS, SCALES, THEMES, etc. . . >

All graphs start with the ggplot function and the data. We’ll use the pipe to pipe the data to the function.

all_workshops %>% 
  ggplot()

We see that even this initializes the plot area of RStudio.

Building a basic bar chart

Next, we define a mapping (using the aesthetic, or aes(), function), by selecting the variables to be plotted and specifying how to present them in the graph, e.g. as x/y positions or characteristics such as size, shape, color, etc. Here we will say that the x axis should contain the affiliation variable. Note how the x-axis populates with some numbers and tick marks. We do not need to specify a y variable, since it will look at the count by default.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation))

Next we need to add ‘geoms’ – graphical representations of the data in the plot (points, lines, bars). ggplot2 offers many different geoms for common graph types. To add a geom to the plot use the + operator.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation)) +
  geom_bar()

By default bar graphs will display the count of the x variable, but it is also possible to specify a y variable that contains the count, as we do in our summarized dataset workshop_breakdown.

In this case, we would want to be able to specify affiliation as the x variable and n as the y variable.

workshop_breakdown %>% 
  ggplot(mapping = aes(x=affiliation, y=n)) +
  geom_bar(stat="identity")

You need to make one other adjustment, and change the stat argument from it’s default of “count” to “identity” This tells it to base the y axis on the specified variable.

Setting vs mapping aesthetics

When working with ggplot2, it’s important to understand the difference between setting aesthetic properties and mapping them. All geoms have certain visual attributes that can be modified. Polygons like bars, have the properties fill and color. You can change the inside color of a bar with fill, and the border with color. We can modify the defaults with the fill and color arguments in the geom_bar() layer. (I’ve also increased the linewidth to make it easier to see the border color)

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation)) +
  geom_bar(fill="blue", color="purple", linewidth=1.5)

Note

How did we know the color names “blue” and “purple” would work in the code above? R has 657 (!!) built in color names. You can see them by calling the function colors(). You can also specify colors using rgb and hexadecimal codes.

Now we have manually set a value for the fill and color. To create our initial graph, we used the mapping argument and the aes() function to map the x axis to the affiliation variable. Watch what happens if we map the fill property to the affiliation variable as well.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=affiliation)) +
  geom_bar()

Building a grouped bar chart

Mapping a variable to an aesthetic is especially useful when we have third variable we want to express on our graph. For example, what if we want to compare attendance by both affiliation and status? To do this we can create a grouped bar chart by mapping fill to the status variable.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) +
  geom_bar()

Note

When you map an aesthetic with aes() in the ggplot() function it is inherited by all subsequent layers. When you map in a geom_*() function it is applied only to that layer.

By default this creates a stacked bar chart. To make it grouped, we add a position="dodged" argument to geom_bar()

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) + 
  geom_bar(position = "dodge")

To make our graph look less crowded, we can also modify the width of the bars. Note this is done outside of the aes() function.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) +
  geom_bar(position = "dodge", width = 0.75)

Adding titles and axis labels

A good plot communicates clearly. So far we’ve talked about achieving this through the aesthetic aspects of your plot, but it’s important to make sure your graph has a clear title and axis labels as well. There are a few ways to do this in ggplot2, but one of the simplest is to use the labs() function.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) +
  geom_bar(position = "dodge", width = 0.75) + 
  labs(title = "Who is attending library workshops?", 
       subtitle = "Attendance by School and Role, 2018-2023", 
       x="School Affiliation", 
       y= "Number of Attendees",
       fill="University Role")

Working with color palettes

There are many options for changing the color palette of your plot. You can set your palette manually:

myPalette <- c("#C8102E", "#FFCD00", "#2C2A29") #Official UMB colors

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) +
  geom_bar(position = "dodge", width =0.75) + 
  labs(title = "Who is attending library workshops?", 
       subtitle = "Attendance by School and Role, 2018-2023", 
       x="School Affiliation", y= "Number of Attendees", 
       fill="University Role") + 
  scale_fill_manual(values = myPalette)

Generally, it may be preferable to work with one of the built in ggplot2 or R palettes, or to install one of several packages with additional palettes such as:

Let’s try applying a viridis palette. viridis was designed to be especially robust for many forms of color-blindness. It is also meant to print well in grey scale. As an additional advantage, a lightweight form of the package is included with ggplot2, so there is no need to install additional packages.

all_workshops %>% 
  ggplot(mapping=aes(x=affiliation, fill=status)) +
  geom_bar(position = "dodge", width = 0.75) + 
  labs(title = "Who is attending library workshops?", 
       subtitle = "Attendance by School and Role, 2018-2023", 
       x="School Affiliation", 
       y= "Number of Attendees",
       fill="University Role") +
  scale_fill_viridis_d()

Learn more from the viridis documentation

Changing the theme

The theme of a ggplot2 graph controls the overall look and all non-data elements of the plot. There are several built-in themes which can be applied as another layer. Start typing theme_ in RStudio to see a list of themes. You can also use the theme() function to modify aspects of an existing theme. Here we apply theme_classic() which removes the grid lines and grey background of the default theme.

all_workshops %>% 
  ggplot(mapping=aes(x=fct_infreq(affiliation), fill=status)) +
  geom_bar(position = "dodge", width = 0.75) + 
  labs(title = "Who is attending library workshops?", 
       subtitle = "Attendance by School and Role, 2018-2023", 
       x="School Affiliation", 
       y= "Number of Attendees",
       fill="University Role") +
  scale_fill_viridis_d() +
  theme_classic()

Wrapping Up

Once your plot looks the way you want, you may want to export it to an image file for use in another document (although in Week 6 we’ll learn how to build reports with plots directly in RStudio).

There are two ways to export your plot:

  1. Use the Export widget in the Plots pane of RStudio
  2. Use the ggsave() function

Let’s first save the grouped bar chart as an object:

workshop_attendance_bar <- all_workshops %>% 
  ggplot(mapping=aes(x=fct_infreq(affiliation), fill=status)) +
  geom_bar(position = "dodge", width = 0.75) + 
  labs(title = "Who is attending library workshops?", 
       subtitle = "Attendance by School and Role, 2018-2023", 
       x="School Affiliation", 
       y= "Number of Attendees",
       fill="University Role") +
  scale_fill_viridis_d() +
  theme_classic()

Now use ggsave() to save a .jpg version of your plot to your figs/ directory. The only required argument is the file path where you want to save the image. By default it will save the last plot generated, but you can also supply the object name of the plot you want to save.

ggsave(filename = "figs/workshop_attendance_bar.jpg", workshop_attendance_bar)
Saving 7 x 5 in image

Note that you can also use this function to adjust the size and resolution of you graph.

ggsave(filename = "figs/workshop_attendance_bar.jpg", workshop_attendance_bar, width = 7, height = 5, dpi = 300)

Helpful Resources

Background and Overview

Tutorials for Many Kinds of Plots

Working with Color and Other Aesthetics