R Community of Practice Week 2

Working with Dates

Introduction

Dates are a particular kind of numeric data type in R. Dates can be tricky to work for a couple of reasons. Often when you import data into R, it will interpret dates as strings, or character data type. This will make things difficult if you want to do calculations with dates, or parse out a part of a date. Second, time units are often variable, months have different numbers of days, there are leap years, and time zones can be another factor to consider. In this lesson, we’ll be introducing the lubridate package, which is part of the tidyverse, and makes working with dates much more manageable.

The Data

Download the data for this lesson here.

Our scenario this week is that we a spreadsheet of data tracking research consults from 2018 to 2023. We want to accomplish the following tasks:

  1. Change the date variable to date data type
  2. Summarize the number of consults by year
  3. Summarize consults by month for FY2022

Let’s open a new script and load the libraries we’ll be working with in this lesson:

library(tidyverse)

And now we can import our data set

consults <- read_csv("data/consults.csv")

Step 1: Changing Strings to Dates

How do we know what kind of data our date variable is? We use the class() function to check data types

class(consults$date)
[1] "character"

Let’s change that to date data type. lubridate has a number of functions for parsing dates in different formats. The name of the function depends on the format of the input date.

For example, if we had a string with date in the month, day, year format, we would use the mdy() function.

myDate <- "June 22, 2023"

myRealDate <- mdy(myDate)

class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"

Or, as another example:

myDate <- "22062023"
myRealDate <- dmy(myDate)

class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"

and finally

myDate <- "2023-06-22"
myRealDate <- ymd(myDate)

class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"

Now that we have a better idea of how these functions work, we can use them to overwrite our original date column with one that is properly recognized as a date. We use the mutate() function from dplyr to create a new variable.

consults <- consults %>% 
  mutate(date = mdy(date))

And let’s check if that worked

class(consults$date)
[1] "Date"

Step 2: Summarize data by year

Another set of functions in lubridate help with extracting pieces of dates.

For example:

year(myRealDate)
## [1] 2023
month(myRealDate)
## [1] 6
day(myRealDate)
## [1] 22
wday(myRealDate)
## [1] 5

Let’s summarize the total number of consults for each year in our data set.

consults_yearly <- consults %>% 
  count(year=year(date), name="total_consults")

consults_yearly
# A tibble: 6 × 2
   year total_consults
  <dbl>          <int>
1  2018            631
2  2019            639
3  2020            463
4  2021            415
5  2022            381
6  2023            141

Step 3: Working with intervals

Now, what if we wanted to take a look at the data for just the last fiscal year? How can we filter our data set to include just the dates we want to look at.

lubridate can work with three types of time spans:

  1. durations- measure physical time in seconds
  2. periods – take into account human variations in time due to varying lengths of months, leap years, etc.
  3. intervals – timespan defined with a start and end time

First we’ll create an interval object.

fy2022 <- interval(mdy("07/01/2021"), mdy("06/30/2022"))

Now we can keep just those consults within our desired time frame.

consults_fy2022 <-
  consults %>% 
  filter(date %within% fy2022)

And we can summarize these consults by month.

consults_fy22_monthly <- consults_fy2022 %>% 
 count(year=year(date), month=month(date), name = "total_consults")

consults_fy22_monthly
# A tibble: 12 × 3
    year month total_consults
   <dbl> <dbl>          <int>
 1  2021     7             31
 2  2021     8             13
 3  2021     9             39
 4  2021    10             65
 5  2021    11             16
 6  2021    12              8
 7  2022     1             16
 8  2022     2             62
 9  2022     3             42
10  2022     4             18
11  2022     5             18
12  2022     6             32

Wrapping Up

Once again, let’s save these our objects of summarized data to our data_output folder.

saveRDS(consults_yearly, "data_output/consults_yearly.RDS")

saveRDS(consults_fy22_monthly, "data_output/consults_fy22_monthly.RDS")

Be sure to check out the lubridate documentation documentation for more!