Dates and times

library(learnr)
library(tutorial.helpers)
library(tidyverse)
library(nycflights13)
library(gt)

knitr::opts_chunk$set(echo = FALSE)
options(tutorial.exercise.timelimit = 60, 
        tutorial.storage = "local") 

make_datetime_100 <- function(year, month, day, time) {
  make_datetime(year, month, day, time %/% 100, time %% 100)
}

flights_dt <- flights |> 
  filter(!is.na(dep_time), !is.na(arr_time)) |> 
  mutate(
    dep_time = make_datetime_100(year, month, day, dep_time),
    arr_time = make_datetime_100(year, month, day, arr_time),
    sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
    sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)
  ) |> 
  select(origin, dest, ends_with("delay"), ends_with("time"))

date2015 <- "
  date
  01/02/15
"

x1 <- ymd_hms("2024-06-01 12:00:00", tz = "America/New_York")
x2 <- ymd_hms("2024-06-01 18:00:00", tz = "Europe/Copenhagen")
x3 <- ymd_hms("2024-06-02 04:00:00", tz = "Pacific/Auckland")
x4 <- c(x1, x2, x3)


y2023 <- ymd("2023-01-01") %--% ymd("2024-01-01")
y2024 <- ymd("2024-01-01") %--% ymd("2025-01-01")
flights_dt2 <- flights_dt |> 
  mutate(
    overnight = arr_time < dep_time,
    arr_time = arr_time + days(overnight),
    sched_arr_time = sched_arr_time + days(overnight)
  )


datetime <- ymd_hms("2026-07-08 12:34:56")
h_age <- today() - ymd("1979-10-14")
one_am <- ymd_hms("2026-03-08 01:00:00", tz = "America/New_York")
flights_dt <- flights_dt |> 
  mutate(
    overnight = arr_time < dep_time,
    arr_time = arr_time + days(overnight),
    sched_arr_time = sched_arr_time + days(overnight)
  )


Introduction

This tutorial covers Chapter 17: Dates and times from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. You will learn how to use the lubridate on the flights data from the nyc13flights package.

Creating date/times through import

In this chapter we are going to focus on dates and date-times as R doesn’t have a native class for storing times. If you need one, you can use the hms package. hms() stands for hour, minute, second.

Exercise 1

Let's load the tidyverse library.


library(...)
library(tidyverse)

Exercise 2

Run today() in the code chunk below.


...()
today()

To get the current date you can use today(). If you want the current date-time, you can use now()

Exercise 3

Run now() in the code chunk below. With this we can get the current date-time.


...()
now()

A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>. Base R calls these POSIXct, but doesn’t exactly trip off the tongue.

Exercise 4

Enter date2015 and hit "Run Code".


date2015
date2015

date2015 is a character vector with a single element which includes both the variable name, date and, after a newline character, a value: 01/02/15.

Exercise 5

Run read_csv() on date2015.


read_csv(...)
read_csv(date2015)

With read_csv() we are able to analyze the aspects of date and date-time with col_types such as col_date(), and col_datetime(). We can view the year features, month features, day features, time features, and other features.

If your CSV contains an ISO8601 date or date-time, you don’t need to do anything; readr will automatically recognize it.

If you haven’t heard of ISO8601 before, it’s an international standard for writing dates where the components of a date are organized from biggest to smallest separated by -. To learn more about ISO8601, visit here

Since date2015 does not contain this, we must manually format the date.

For other date-time formats, you’ll need to use col_types plus col_date() or col_datetime() along with a date-time format. The date-time format used by readr is a standard used across many programming languages, describing a date component with a % followed by a single character. For example, %Y-%m-%d specifies a date that’s a year, -, month (as number) -, day. The table below lists all the options.

data <- data.frame(
  Type = c("Year", "", "Month", "", "", "Day", "", "Time", rep("", 7), "Other", ""),
  Code = c("%Y", "%y", "%m", "%b", "%B", "%d", "%e", "%H", "%I", "%p", "%M", "%S", "%0S", "%Z", "%z", "%.", "%*"),
  Meaning = c("4 digit year", "2 digit year", "Number", "Abbreviated name", "Full name",
              "One or two digits", "Two digits", "24-hour hour", "12-hour hour", "AM/PM",
              "Minutes", "Seconds", "Seconds with decimal component", "Time zone name",
              "Offset from UTC", "Skip one non-digit", "Skip any number of non-digits"),
  Example = c("2021", "21", "2", "Feb", "February", "2", "02", "13", "1", "pm", "35", "45",
              "45.35", "America/Chicago", "+0800", ":", "")
)

gt(data) |>
  tab_header(title = "") |>
  cols_align(align = "left") |>
  tab_options(column_labels.font.weight = "bold") |>
  tab_style(
    style = cell_borders(
      sides = "bottom",
      color = "light gray",
      weight = px(1)
    ),
    locations = cells_body(
      rows = c(2, 5, 7, 15, 17)
    )
  )

Exercise 6

We will use col_types to format the date in the date2015. Set col_types to cols(date = col_date("%m/%d/%y")) in read_csv().


read_csv(date2015, ... = cols(date = col_date("%m/%d/%y")))
read_csv(date2015, col_types = cols(date = col_date("%m/%d/%y")))

With this we can see a tibble that formats our date in year/month/day format. It thinks that the "01" is the month, "02" is the day and "15" is the year.

Exercise 7

Let's try a different format. Set col_types to cols(date = col_date("%d/%m/%y")) in read_csv().


read_csv(date2015, col_types = ...)
read_csv(date2015, col_types = cols(date = col_date("%d/%m/%y")))

Now, we can see that the date is different than the one outputted previously. It is still in the same format, but, now we can see that the day and month have been switched.

Exercise 8

Set col_types to cols(date = col_date("%y/%m/%d")) in read_csv().


read_csv(date2015, col_types = ...)
read_csv(date2015, col_types = cols(date = col_date("%y/%m/%d")))

Note that no matter how you specify the date format, it’s always displayed the same way once you get it into R. However, we see that the date is once again different. The code reads the "01" as the year and so on.

Oftentimes when working with non-English dates and %b or %B you will need to use locale(). Check out date_names_langs() for more information.

Creating date/times with strings

The date-time specification language is powerful, but requires careful analysis of the date format. An alternative approach is to use lubridate’s helpers which attempt to automatically determine the format once you specify the order of the component.

Exercise 1

Run mdy() on "January 31st, 2017".


...("January 31st, 2017")
mdy("January 31st, 2017")

Exercise 2

Run dmy() on "31-Jan-2017"



dmy("31-Jan-2017")

Exercise 3

Run ymd() on "2017-01-31".


ymd(...)
ymd("2017-01-31")

When using these three formats we need to identify the order in which year, month, and day appear in your dates, then arrange “y”, “m”, and “d” in the same order. That gives you the name of the lubridate function that will parse your date.

Exercise 4

We can also use the "hms" function to create an datetime. Set ymd_hms() to "2017-01-31 20:11:59".


ymd_hms(...)
ymd_hms("2017-01-31 20:11:59")

ymd() and friends create dates. To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function

Exercise 5

You can also force the creation of a date-time from a date by supplying a timezone. Copy and paste your code from Exercise 1 and set tz to "UTC".


ymd("2017-01-31", tz = ...)

With this we are able to create a datetime that is automatically set to UTC. The UTC timezone known as GMT, or Greenwich Mean Time, doesn't use daylight saving time, making it a bit easier to compute with.

Creating dates and times with individual components

We can use the datetime features on pieces of data. We will use the datetime component on the flights data.

Instead of a single string, sometimes you’ll have the individual components of the date-time spread across multiple columns. For example, look at what we have in the flights data

flights |> 
  select(year, month, day, hour, minute)

We have all the components needed to make a datetime, but, we still need to make it manually.

Exercise 1

Continue this pipe with mutate(). Create a new variable, departure, set it equal to make_data() and set the arguments inside to year, month, day, hour, minute.

flights |> 
  select(year, month, day, hour, minute)
... |>
  mutate(departure = ...)
... |>
  mutate(departure = make_datetime(..., ..., ..., ..., ...))
flights |> 
  select(year, month, day, hour, minute) |> 
  mutate(departure = make_datetime(year, month, day, hour, minute))

To create a date/time from this sort of input, we use make_date() for dates, or make_datetime() for date-times

Exercise 2

Now we will create datetimes for arrival time, departure time, scheduled arrival time and scheduled departure time.

Good news, we did this for you! Run the code.

make_datetime_100 <- function(year, month, day, time) {
  make_datetime(year, month, day, time %/% 100, time %% 100)
}

flights_dt <- flights |> 
  filter(!is.na(dep_time), !is.na(arr_time)) |> 
  mutate(
    dep_time = make_datetime_100(year, month, day, dep_time),
    arr_time = make_datetime_100(year, month, day, arr_time),
    sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
    sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)
  ) |> 
  select(origin, dest, ends_with("delay"), ends_with("time"))

make_datetime_100 is a function that takes in the year, month, day and time variables to create a datetime. It is not always necessary to make a function like this, but, it does make life easier.

We are going to focus on four columns dep_time, sched_dep_time, arr_time, sched_arr_time because they are all (datetime).

Exercise 3

Pipe flights_dt to ggplot. Inside the aes argument, set x to dep_time.


flights_dt |>
  ggplot(aes(... = ...)) + ...
flights_dt |>
  ggplot(aes(... = ...)) 
flights_dt |> 
  ggplot(aes(x = dep_time))

The plot should be blank since we have not added any layers.

Exercise 4

Add geom_freqpoly and set the binwidth to 86400.


... +
  geom_freqpoly(binwidth = ...)
flights_dt |> 
  ggplot(aes(x = dep_time)) + 
  geom_freqpoly(binwidth = 86400)

With this data, we can visualize the distribution of departure times across the year. Note, we use 86400 for the binwidth since there are 86400 seconds in a year.

Exercise 5

Pipe flights_dt to filter. Set dep_time to be less than ymd(20130102).


flights_dt |>
  filter(dep_time < ymd(...))
flights_dt |> 
  filter(dep_time < ymd(20130102))

We filter the dep_time in this way so that we only get data for the first of January. If we wanted a random date in the middle of the year, we would have to use && and have two different arguments, one limiting it with > and another with <.

Exercise 6

Continue the pipe to ggplot, map dep_time to the x-axis.


... |> 
  ggplot(aes(x = ...))
flights_dt |> 
  filter(dep_time < ymd(20130102)) |> 
  ggplot(aes(x = dep_time))

Exercise 7

Add geom_freqpoly, set the binwidth argument to 600


... +
  geom_freqpoly(binwidth = ...)
flights_dt |> 
  filter(dep_time < ymd(20130102)) |> 
  ggplot(aes(x = dep_time)) + 
  geom_freqpoly(binwidth = 600)

Now we can visualize the distribution of departure times within a single day

Note that when you use date-times in a numeric context (like in a histogram), 1 means 1 second, so a binwidth of 86400 means one day. For dates, 1 means 1 day.

Creating date/times from other types

You may want to switch between a date-time and a date. That’s the job of as_datetime() and as_date()

Exercise 1

Run as_datetime() on today()


as_datetime(...)
as_datetime(today())

If we were to just run today(), we would get the date. But, if we use as_datetime(), we get a datetime.

Exercise 2

Run as_date() on now()


as_date(...)
as_date(now())

When we normally run now(), we get the time and date, but once we run as_date(), we just get the date.

Exercise 3

Run as_datetime(60 * 60 * 10)


as_datetime(...)
as_datetime(60 * 60 * 10)

Exercise 4

Run as_date(365 * 10 + 2)


as_date(...)
as_date(365 * 10 + 2)

Sometimes you’ll get date/times as numeric offsets from the “Unix Epoch”, 1970-01-01. If the offset is in seconds, use as_datetime(); if it’s in days, use as_date().

In other words, we can use total days/seconds to create a date/datetime

Date-time components with getting components

Now that you know how to get date-time data into R’s date-time data structures, let’s explore what you can do with them. These next sections will focus on the accessor functions that let you get and set individual components. The next section will look at how arithmetic works with date-times.

Exercise 1

Create a new datetime using ymd_hms() with the argument "2026-07-08 12:34:56". Set is equal to a new object named datetime


datetime <- ymd_hms("...")
datetime <- ymd_hms("2026-07-08 12:34:56")

We will perform a variety of functions on this.

Exercise 2

Run year() on datetime


year(...)
year(datetime)

data <- data.frame(
  Type = c("Year", ""),
  Code = c("%Y", "%y"),
  Meaning = c("4 digit year", "2 digit year"),
  Example = c("2021", "21")
)

gt(data) |>
  tab_header(title = "") |>
  cols_align(align = "center") |>
  tab_options(column_labels.font.weight = "bold")

year() returns the %Y, looking at the table above, we can see that this is the 4 digit year. Even if we were to replace "2026" with "26", year(datetime) would still return 2026. Don't believe us? Try it for yourself in the console!

Exercise 3

Run month() on datetime


month(...)
month(datetime)

data <- data.frame(
  Type = c("Month", "", ""),
  Code = c("%m", "%b", "%B"),
  Meaning = c("Number", "Abbreviated name", "Full Name"),
  Example = c("2", "Feb", "February")
)

gt(data) |>
  tab_header(title = "") |>
  cols_align(align = "center") |>
  tab_options(column_labels.font.weight = "bold")

month() returns the %m, looking at the table, we can see that this is the number of the month. If we were to replace "07" with the %b (Jul) or the %B (July), we would still get 7 as our output.

Exercise 4

Run mday(), yday() and wday() on datetime


mday(...)
yday(...)
wday(...)
mday(datetime)
yday(datetime)
wday(datetime)

data <- data.frame(
  Type = c("Day", ""),
  Code = c("%d", "%e"),
  Meaning = c("One or two digits", "Two digits"),
  Example = c("2", "02")
)

gt(data) |>
  tab_header(title = "") |>
  cols_align(align = "center") |>
  tab_options(column_labels.font.weight = "bold")

mday(), yday() and wday() all return the d%. They do not return a second digit unless necessary.

Exercise 5

Lastly, run hour(), minute(), second() and tz() on datetime.


hour(...)
minute(...)
second(...)
tz(...)
hour(datetime)
minute(datetime)
second(datetime)
tz(datetime)

data <- data.frame(
  Type = c("Time", rep("", 7)),
  Code = c("%H", "%I", "%p", "%M", "%S", "%0S", "%Z", "%z"),
  Meaning = c("24-hour hour", "12-hour hour", "AM/PM", "Minutes", "Seconds", 
              "Seconds with decimal component", "Time zone name", "Offset from UTC"),
  Example = c("13", "1", "pm", "35", "45", "45.35", "America/Chicago", "+0800")
)

gt(data) |>
  tab_header(title = "") |>
  cols_align(align = "center") |>
  tab_options(column_labels.font.weight = "bold")

hour() returns the %H, or the 24-hour hour. minute() returns the standard minute, or %M. However, second() returns the %OS, not the %S. If you add a decimal to the end of the timestamp and run second(), the decimal is returned as well. tz() returns the timezone. By fault, it will return "UTC". To specify, add the timezone with after a space following the time.

You can pull out individual parts of the date with the accessor functions year(), month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), second() and tz(). These are effectively the opposites of make_datetime().

Exercise 6

Now, run month() with the argument datetime, set label = TRUE


month(..., label = ...)
month(datetime, label = TRUE)

Exercise 7

Run wday() with the argument datetime, set label = TRUE and abbr to FALSE


wday(..., label = ..., abbr = ...)
wday(datetime, label = TRUE, abbr = FALSE)

For month() and wday() you can set label = TRUE to return the abbreviated name of the month (%b) or day of the week. You can set abbr = FALSE to return the full name (%B).

Exercise 8

Start a pipe with flights_dt to mutate(). Make a new variable weekday, set it equal to wday() with the arguments dep_time and label = TRUE.


flights_dt |>
  mutate(weekday = wday(..., ....))
flights_dt |> 
  mutate(wday = wday(dep_time, label = TRUE))

By setting the label = TRUE we can return the abbreviated name of the day of the week. If we wanted the full name, we would've added the argument abbr = FALSE

Exercise 9

Continue the pipe to ggplot(). Map wday to the x-axis.


... |>
  ggplot(aes(x = ...))
flights_dt |> 
  mutate(wday = wday(dep_time, label = TRUE)) |> 
  ggplot(aes(x = wday))

Exercise 10

Finish off your pipe with geom_bar()


...+
  geom_bar()
flights_dt |> 
  mutate(wday = wday(dep_time, label = TRUE)) |> 
  ggplot(aes(x = wday)) +
  geom_bar()

We can use wday() to see that more flights depart during the week than on the weekend.

Exercise 11

Start a new pipe with flights_dt to mutate(), create a new variable minute, and set it equal to minute(dep_time)


flights_dt |> 
  mutate(... = minute(...))
flights_dt |> 
  mutate(minute = minute(dep_time))

We create a new variable so that it is easier to group, which we will do next.

Exercise 12

Continue the pipe to group_by(), with the argument minute(). Next, pipe the function to summarize(), set the argument avg_delay equal to mean(dep_delay, na.rm = TRUE) and n = n()


... |> 
  group_by(...) |> 
  summarize(
    avg_delay = ...,
    n = ...
  )
sched_dep <- flights_dt |> 
  mutate(minute = minute(sched_dep_time)) |> 
  group_by(minute) |> 
  summarize(
    avg_delay = mean(arr_delay, na.rm = TRUE),
    n = n()
  )

We get the average delay by using mean(). We group each avg_delay by the minute, this is the same thing as using .by in summarize.

Exercise 13

Continue the pipe to ggplot(). Map minute to the x-axis and avg_delay to the y-axis. Add the layer geom_line()


... |>
  ggplot(aes(..., ...)) +
  geom_line()
flights_dt |> 
  mutate(minute = minute(dep_time)) |> 
  group_by(minute) |> 
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE),
    n = n()
  ) |> 
  ggplot(aes(x = minute, y = avg_delay)) +
  geom_line()

We can also look at the average departure delay by minute within the hour. There’s an interesting pattern: flights leaving in minutes 20-30 and 50-60 have much lower delays than the rest of the hour!

Exercise 14

Copy your code from the last plot and change the argument inside of minute to be sched_dep_time instead of dep_time


flights_dt |> 
  mutate(minute = minute(...)) |> 
  ...
flights_dt |> 
  mutate(minute = minute(sched_dep_time)) |> 
  group_by(minute) |> 
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE),
    n = n()
  ) |> 
  ggplot(aes(x = minute, y = avg_delay)) +
  geom_line()

Interestingly, if we look at the scheduled departure time we don’t see such a strong pattern:

Date and time components with rounding

An alternative approach to plotting individual components is to round the date to a nearby unit of time, with floor_date(), round_date(), and ceiling_date(). Each function takes a vector of dates to adjust and then the name of the unit to round down (floor), round up (ceiling), or round to.

Exercise 1

Pipe flights_dt to count(). Set week to floor_date(dep_time, "week").


... |> 
  count(... = floor_date(..., "..."))
flights_dt |> 
  count(week = floor_date(dep_time, "week"))

count() lets you quickly count the unique values of one or more variables

The first argument of floor_date() is the source we are pulling the data from. The second argument is which variable we want to be rounded.

Exercise 2

Continue the pipe with ggplot(). Map week to the x-axis and n to the y-axis. Add geom_line() and geom_point()


... |> 
  ggplot(aes(x = ..., y = ...) +
    ...() +
    ...()
flights_dt |> 
  count(week = floor_date(dep_time, "week")) |> 
  ggplot(aes(x = week, y = n)) +
  geom_line() + 
  geom_point()

Every flight that happens in a certain week is added up and plotted as one point using floor_date(). Now, we can see the distribution of flights each week in the year.

Exercise 3

Start a new pipe with flights_dt to mutate(). Set dep_hour to hms::as_hms(dep_time - floor_date(dep_time, "day")))


flights_dt |> 
  mutate(...)
flights_dt |> 
  mutate(dep_hour = hms::as_hms(dep_time - floor_date(dep_time, "day")))

Computing the difference between a pair of date-times yields a difftime. A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so we add hms::as_hms, to have it return a timestamp instead.

Exercise 4

Continue the pipe to ggplot(). Map dep_hour to the x-axis. Add the geom_freqpoly() layer and set the binwidth to 1800


... |> 
  ggplot(aes(x = dep_hour)) +
  geom_freqpoly(binwidth = 60 * 30)
flights_dt |> 
  mutate(dep_hour = hms::as_hms(dep_time - floor_date(dep_time, "day"))) |> 
  ggplot(aes(x = dep_hour)) +
  geom_freqpoly(binwidth = 60 * 30)

Here we use rounding to show the distribution of flights across the course of a day by computing the difference between dep_time and the earliest instant of that day, which we find by using floor_date(). If we wanted the latest instant of that day, we would use ceiling_date()

Modifying date/time components

You can also use each accessor function to modify the components of a date/time. This doesn’t come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.

Exercise 1

Let's modify this date: "2026-07-08 12:34:56" with the ymd_hms() function. Create a datetime variable and set ymd_hms() to this.


datetime <- ymd_hms("...")
datetime <- ymd_hms("2026-07-08 12:34:56")

Exercise 2

Let's modify the date by using year() on datetime and setting it to 2030. Type datetime after to see whether it has been changed.


year(...) <- 2030
datetime
year(datetime) <- 2030
datetime

We see that everything else about the date is the same except the year, which is now 2030 instead of 2026.

Exercise 3

Now, we're going to modify the date by using month(). Run month() on datetime and setting it to 2030. Type datetime after to see whether it has been changed.


month(...) <- 01
...
month(datetime) <- 01
datetime

Once again, everything is the same about the date except for the month.

Exercise 4

Run hour(datetime) <- hour(datetime) + 1. Print datetime to see the changes.


hour(...) <- hour(...) + 1
...
hour(datetime) <- hour(datetime) + 1
datetime

We can see that the hour has gone up by one. We first recieve the hour of datetime, we add one to it, and we set that value to datetime.

With the ymd_hms() functions we can automatically assign the Universal Coordinated Time Zone (UTC) to the parsed date.

Exercise 5

Run update() with the arguments datetime, year = 2030, month = 2, mday = 2, hour = 2


update(..., year = ..., month = ..., mday = ..., hour = ...)
update(datetime, year = 2030, month = 2, mday = 2, hour = 2)

Alternatively, rather than modifying an existing variable, you can create a new date-time with update(). This also allows you to set multiple values in one step as shown above. Check out update() for more information

Exercise 6

Run update() with the arguments ymd("2023-02-01") and mday = 30


update(ymd("..."), mday = ...)
update(ymd("2023-02-01"), mday = 30)

If values are too big they will roll over.

Time Spans with Duration

Next you’ll learn about how arithmetic with dates works, including subtraction, addition, and division. Along the way, you’ll learn about three important classes that represent time spans:

- Durations, which represent an exact number of seconds. - Periods, which represent human units like weeks and months. - Intervals, which represent a starting and ending point.

How do you pick between duration, periods, and intervals? As always, pick the simplest data structure that solves your problem. If you only care about physical time, use a duration; if you need to add human times, use a period; if you need to figure out how long a span is in human units, use an interval.

Exercise 1

Run today() and subtract ymd("1979-10-14") from it. Set the resulting object to h_age. Print out h_age


h_age <- ...() - ymd("...")
...
h_age <- today() - ymd("1979-10-14")
h_age

You should get a time difference of 16354 days.

In R, when you subtract two dates, you get a difftime object.

A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the duration.

Exercise 2

Run as.duration on h_age


as.duration(...)
as.duration(h_age)

The resulting output should be in seconds. as.duration() gets rid of the ambiguity of different types of time spans and always uses seconds.

Exercise 3

Durations come with a bunch of convenient constructors:

Run the code below.

dseconds(15)
dminutes(10)
dhours(c(12, 24))
ddays(0:5)
dweeks(3)
dyears(1)
dseconds(15)
dminutes(10)
dhours(c(12, 24))
ddays(0:5)
dweeks(3)
dyears(1)

Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week. Larger time units are more problematic. A year uses the “average” number of days in a year, i.e. 365.25. There’s no way to convert a month to a duration, because there’s just too much variation.

You can also add use c() and : to run multiple arguments at once.

Exercise 4

Multiply 2 with dyears(1). Add dweeks(12)


2 * dyears(...) + dweeks(...)
2 * dyears(1) + dweeks(12)

You can add and multiply durations as well. This is as simple as it looks, everything is converted to seconds, then the addition/multiplication happens and the final value is returned in seconds and years.

Exercise 5

Add today() to ddays(1). On the next line subtract dyears(1) from today()


...() + ddays(...)
...() - dyears(...)
today() + ddays(1)
today() - dyears(1)

You can add and subtract durations to and from dates. Notice you get the date for tomorrow and todays date last year.

Exercise 6

Create a new object named one_am. Set one_am to ymd_hms("2026-03-08 01:00:00", tz = "America/New_York")


one_am <- ...
one_am <- ymd_hms("2026-03-08 01:00:00", tz = "America/New_York")

However, because durations represent an exact number of seconds, sometimes you might get an unexpected result

Exercise 7

Print one_am, then on the next line add ddays(1) to one_am.


one_am
one_am + ...
one_am
one_am + ddays(1)

Why is one day after 1am March 8, 2am March 9? If you look carefully at the date you might also notice that the time zones have changed. March 8 only has 23 hours because it’s when DST starts, so if we add a full days worth of seconds we end up with a different time.

With durations, we can add, divide, subtract, and multiply with other functions to get a new desired output. With duration we can use dseconds() (seconds), dminutes() (minutes), dhours() (hours), ddays() (days), dweeks() (weeks), dyears() (years). Check out all of the other functions here: Duration

Time Spans with Periods

To solve the problem from Exercise 7 of the last section, lubridate provides periods. Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them to work in a more intuitive way

Exercise 1

Run the code below

hours(c(12, 24))
days(7)
months(1:6)
hours(c(12, 24))
days(7)
months(1:6)

Like durations, periods can be created with a number of friendly constructor functions.

Instead of returning an output in seconds, the value returned can be in hours, days or months, it just depends on the function used.

Exercise 2

Add months(6) and days(1). Multiply the resulting value by 10.


10 * (months(...) + days(...))
10 * (months(6) + days(1))

You can add and multiply periods similar to how we do with durations.

Exercise 3

Add dyears(1) to ymd("2024-01-01"). Do the same thing on the next line but change dyears() to just years()


ymd("...") + dyears(...)
ymd("...") + years(...)
ymd("2024-01-01") + dyears(1)
ymd("2024-01-01") + years(1)

We can of course add periods to dates.

Exercise 4

Add ddays(1) and days(1) to one_am on separate lines.


one_am + ddays(...)
one_am + days(...)
one_am + ddays(1)
one_am + days(1)

We can safely say that compared to durations, periods are more likely to do what you expect.

Exercise 5

Let’s use periods to fix an oddity related to our flight dates. Some planes appear to have arrived at their destination before they departed from New York City.

flights_dt |> 
  filter(arr_time < dep_time) 

Pipe mutate() to the function. Create a new variable, overnight, and set it equal to arr_time < dep_time

flights_dt2 <- flights_dt
... |>
  mutate(... = ...)
flights_dt2 <- flights_dt |> 
  mutate(
    overnight = arr_time < dep_time)

These are overnight flights. We used the same date information for both the departure and the arrival times, but these flights arrived on the following day. We can fix this by adding days(1) to the arrival time of each overnight flight.

Exercise 6

Modify the arr_time and sched_arr_time variables by adding days(overnight) inside of mutate()


... |>
  mutate(...,
         arr_time = arr_time + ...,
         sched_arr_time = sched_arr_time + ...)
flights_dt2 <- flights_dt |> 
  mutate(
    overnight = arr_time < dep_time,
    arr_time = arr_time + days(overnight),
    sched_arr_time = sched_arr_time + days(overnight)
  )

Exercise 7

Pipe flights_dt2 to filter(), with the argument arr_time < dep_time


flights_dt2 |> 
  filter(... < ...)
flights_dt2 |> 
  filter(arr_time < dep_time)

We see that there are no flights that departed after they arrived, since that is not possible! All of our flights finally obey the laws of physics.

Time Spans with Intervals

Exercise 1

Run dyears(1) / ddays(365)


dyears(...) / ddays(...)
dyears(1) / ddays(365)

You may be wondering why the answer is not 1. Well, there are techinically 365.25 days in a year, divided by 365 gets you a little over one.

Exercise 2

Run years(1) / days(1)


years(...) / days(...)
years(1) / days(1)

Initially you may ask what this returns. Well, if the year was 2015 it should return 365, but if it was 2016, it should return 366! There’s not quite enough information for lubridate to give a single clear answer. What it does instead is give an estimate.

Exercise 3

Connect ymd("2023-01-01") and ymd("2024-01-01") with a %--%. Set it equal to a new object called y2023. Print y2023


y2023 <- ymd("...") %--% ymd("...")
y2023 <- ymd("2023-01-01") %--% ymd("2024-01-01")
y2023

If you want a more accurate measurement, you’ll have to use an interval. An interval is a pair of starting and ending date times, or you can think of it as a duration with a starting point.

You can create an interval by writing start %--% end:

Exercise 4

Do the same thing, but for the year 2024. Name this new object y2024


y2024 <- ymd("...") %--% ymd("...")
y2024 <- ymd("2024-01-01") %--% ymd("2025-01-01")
y2024

Exercise 5

Divide y2023 and y2024 by days(1)


y2023 / days(...)
y2024 / days(...)
y2023 / days(1)
y2024 / days(1)

Here we see that now we either get 365 or 366. Instead of getting an estimate, we get the exact number of days for that year.

Whenever possible we attempt to use duration, periods, and interval respectively based on the situation because of the accuracy it provides us.

Exercise 6

Subtract today() from your birthday using ymd()


today() - ymd("...")
age <- today() - ymd("2000-1-30") 
# My birthday

The output may seem familiar to you, since this is your age!

Time Zones

Time zones are an enormously complicated topic because of their interaction with geopolitical entities. Fortunately we don’t need to dig into all the details as they’re not all important for data analysis, but there are a few challenges we’ll need to tackle head on.

The first challenge is that everyday names of time zones tend to be ambiguous. For example, if you’re American you’re probably familiar with EST, or Eastern Standard Time. However, both Australia and Canada also have EST! To avoid confusion, R uses the international standard IANA time zones. These use a consistent naming scheme {area}/{location}, typically in the form {continent}/{city} or {ocean}/{city}. Examples include “America/New_York”, “Europe/Paris”, and “Pacific/Auckland”.

You might wonder why the time zone uses a city, when typically you think of time zones as associated with a country or region within a country. This is because the IANA database has to record decades worth of time zone rules. Over the course of decades, countries change names (or break apart) fairly frequently, but city names tend to stay the same. Another problem is that the name needs to reflect not only the current behavior, but also the complete history. For example, there are time zones for both “America/New_York” and “America/Detroit”. These cities both currently use Eastern Standard Time but in 1969-1972 Michigan (the state in which Detroit is located), did not follow DST, so it needs a different name. It’s worth reading the raw time zone database (available at https://www.iana.org/time-zones) just to read some of these stories!

Exercise 1

Type Sys.timezone() and run in the code chunk below.


...()
Sys.timezone()

We can find out what R thinks our current time zone is with Sys.timezone(). If R doesn't know, you will get an NA

Exercise 2

Run OlsonNames()


OlsonNames()
OlsonNames()

OlsonNames() gives us the complete list of all time zone names.

Exercise 3

Create a new datetime. Use ymd_hms() with the argument "2024-06-01 12:00:00", set the tzto "America/New_York". Set the equal to a new object named x1. Print x1


x1 <- ymd_hms("...", tz = "...")
...
x1 <- ymd_hms("2024-06-01 12:00:00", tz = "America/New_York")
x1

Exercise 4

Now, do the same thing again, but, change the datetime to "2024-06-01 18:00:00", and the tz to "Europe/Copenhagen". Set it equal to a new object called x2. Print x2


x2 <- ymd_hms("...", tz = "...")
...
x2 <- ymd_hms("2024-06-01 18:00:00", tz = "Europe/Copenhagen")
x2

Exercise 5

Do the same thing again, but, change the datetime to "2024-06-02 04:00:00", and the tz to "Pacific/Auckland". Set it equal to a new object called x3. Print x3


x3 <- ymd_hms("...", tz = "...")
...
x3 <- ymd_hms("2024-06-02 04:00:00", tz = "Pacific/Auckland")
x3

In R, the time zone is an attribute of the date-time that only controls printing. For example, these three objects represent the same instant in time

Exercise 6

Subtract x2 and x3 from x1 on separate lines.


x1 - ...
x1 - ...
x1 - x2
x1 - x3

We can verify that they’re the same time using subtraction, since the difference is 0.

Exercise 7

Create a new object named x4. Set is equal to x1, x2 and x3 using a vector c(). Print x4


x4 <- c(..., ..., ...)
...
x4 <- c(x1, x2, x3)
x4

Unless otherwise specified, lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and is roughly equivalent to GMT (Greenwich Mean Time). It does not have DST, which makes a convenient representation for computation. Operations that combine date-times, like c(), will often drop the time zone. In that case, the date-times will display in the time zone of the first element.

You can change the time zone in two ways:

Exercise 8

Use the with_tz() function with the arguments x4 and tzone = "Australia/Lord_Howe". Set this equal to a new object x4a, print it and subtract x4 from it.


x4a <- with_tz(x4, tzone = "Australia/Lord_Howe")
x4a
x4a - x4
x4a <- with_tz(x4, tzone = "Australia/Lord_Howe")
x4a
x4a - x4

Here we keep the instant in time the same, as shown by the the time difference of 0 seconds, using with_tz, but we change how it’s displayed. We use this when the instant is correct, but want a more natural display.

Exercise 9

Use the force_tz() function with the arguments x4 and tzone = "Australia/Lord_Howe". Set this equal to a new object x4b, print it and subtract x4 from it.


x4b <- force_tz(x4, tzone = "Australia/Lord_Howe")
x4b
x4b - x4
x4b <- force_tz(x4, tzone = "Australia/Lord_Howe")
x4b
x4b - x4

Here, we change the underlying instant in time, as shown by the time difference of 14.5 hours, using force_tz(). We use this when we have an instant that has been labelled with the incorrect time zone, and need to fix it.

Summary

This tutorial covered Chapter 17: Dates and times from R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. You learned how to use the lubridate on the flights data from the nyc13flights package.




Try the r4ds.tutorials package in your browser

Any scripts or data that you put into this service are public.

r4ds.tutorials documentation built on April 3, 2025, 5:50 p.m.