library(tidyverse)
Introduction
Dates are a particular kind of numeric data type in R. Dates can be tricky to work for a couple of reasons. Often when you import data into R, it will interpret dates as strings, or character data type. This will make things difficult if you want to do calculations with dates, or parse out a part of a date. Second, time units are often variable, months have different numbers of days, there are leap years, and time zones can be another factor to consider. In this lesson, we’ll be introducing the lubridate
package, which is part of the tidyverse, and makes working with dates much more manageable.
The Data
Download the data for this lesson here.
Our scenario this week is that we a spreadsheet of data tracking research consults from 2018 to 2023. We want to accomplish the following tasks:
- Change the date variable to date data type
- Summarize the number of consults by year
- Summarize consults by month for FY2022
Let’s open a new script and load the libraries we’ll be working with in this lesson:
And now we can import our data set
<- read_csv("data/consults.csv") consults
Step 1: Changing Strings to Dates
How do we know what kind of data our date variable is? We use the class()
function to check data types
class(consults$date)
[1] "character"
Let’s change that to date data type. lubridate
has a number of functions for parsing dates in different formats. The name of the function depends on the format of the input date.
For example, if we had a string with date in the month, day, year format, we would use the mdy()
function.
<- "June 22, 2023"
myDate
<- mdy(myDate)
myRealDate
class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"
Or, as another example:
<- "22062023"
myDate <- dmy(myDate)
myRealDate
class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"
and finally
<- "2023-06-22"
myDate <- ymd(myDate)
myRealDate
class(myDate)
## [1] "character"
class(myRealDate)
## [1] "Date"
Now that we have a better idea of how these functions work, we can use them to overwrite our original date column with one that is properly recognized as a date. We use the mutate()
function from dplyr
to create a new variable.
<- consults %>%
consults mutate(date = mdy(date))
And let’s check if that worked
class(consults$date)
[1] "Date"
Step 2: Summarize data by year
Another set of functions in lubridate
help with extracting pieces of dates.
For example:
year(myRealDate)
## [1] 2023
month(myRealDate)
## [1] 6
day(myRealDate)
## [1] 22
wday(myRealDate)
## [1] 5
Let’s summarize the total number of consults for each year in our data set.
<- consults %>%
consults_yearly count(year=year(date), name="total_consults")
consults_yearly
# A tibble: 6 × 2
year total_consults
<dbl> <int>
1 2018 631
2 2019 639
3 2020 463
4 2021 415
5 2022 381
6 2023 141
Step 3: Working with intervals
Now, what if we wanted to take a look at the data for just the last fiscal year? How can we filter our data set to include just the dates we want to look at.
lubridate
can work with three types of time spans:
- durations- measure physical time in seconds
- periods – take into account human variations in time due to varying lengths of months, leap years, etc.
- intervals – timespan defined with a start and end time
First we’ll create an interval object.
<- interval(mdy("07/01/2021"), mdy("06/30/2022")) fy2022
Now we can keep just those consults within our desired time frame.
<-
consults_fy2022 %>%
consults filter(date %within% fy2022)
And we can summarize these consults by month.
<- consults_fy2022 %>%
consults_fy22_monthly count(year=year(date), month=month(date), name = "total_consults")
consults_fy22_monthly
# A tibble: 12 × 3
year month total_consults
<dbl> <dbl> <int>
1 2021 7 31
2 2021 8 13
3 2021 9 39
4 2021 10 65
5 2021 11 16
6 2021 12 8
7 2022 1 16
8 2022 2 62
9 2022 3 42
10 2022 4 18
11 2022 5 18
12 2022 6 32
Wrapping Up
Once again, let’s save these our objects of summarized data to our data_output
folder.
saveRDS(consults_yearly, "data_output/consults_yearly.RDS")
saveRDS(consults_fy22_monthly, "data_output/consults_fy22_monthly.RDS")
Be sure to check out the lubridate
documentation documentation for more!