knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The goal of this vignette is to introduce you to clock's high-level API, which works directly on R's built-in date-time types, Date and POSIXct. For an overview of all of the functionality in the high-level API, check out the pkgdown reference section, High Level API. One thing you should immediately notice is that every function specific to R's date and date-time types are prefixed with
date_*(). There are also additional functions for arithmetic (
add_*()) and getting (
get_*()) or setting (
set_*()) components that are also used by other types in clock.
As you'll quickly see in this vignette, one of the main goals of clock is to guard you, the user, from unexpected issues caused by frustrating date manipulation concepts like invalid dates and daylight saving time. It does this by letting you know as soon as one of these issues happens, giving you the power to handle it explicitly with one of a number of different resolution strategies.
To create a vector of dates, you can use
date_build(). This allows you to specify the components individually.
date_build(2019, 2, 1:5)
If you happen to specify an invalid date, you'll get an error message:
date_build(2019, 1:12, 31)
One way to resolve this is by specifying an invalid date resolution strategy using the
invalid argument. There are multiple options, but in this case we'll ask for the invalid dates to be set to the previous valid moment in time.
date_build(2019, 1:12, 31, invalid = "previous")
To learn more about invalid dates, check out the documentation for
If we were actually after the "last day of the month", an easier way to specify this would have been:
date_build(2019, 1:12, "last")
You can also create date-times using
date_time_build(), which generates a POSIXct. Note that you must supply a time zone!
date_time_build(2019, 1:5, 1, 2, 30, zone = "America/New_York")
If you "build" a time that doesn't exist, you'll get an error. For example, on March 8th, 2020, there was a daylight saving time gap of 1 hour in the America/New_York time zone that took us from
01:59:59 directly to
03:00:00, skipping the 2 o'clock hour entirely. Let's "accidentally" create a time in that gap:
date_time_build(2019:2021, 3, 8, 2, 30, zone = "America/New_York")
To resolve this issue, we can specify a nonexistent time resolution strategy through the
nonexistent argument. There are a number of options, including rolling forward or backward to the next or previous valid moments in time:
zone <- "America/New_York" date_time_build(2019:2021, 3, 8, 2, 30, zone = zone, nonexistent = "roll-forward") date_time_build(2019:2021, 3, 8, 2, 30, zone = zone, nonexistent = "roll-backward")
To parse dates, use
date_parse(). Parsing dates requires a format string, a combination of commands that specify where date components are in your string. By default, it assumes that you're working with dates in the form
You can change the format string using
date_parse("January 5, 2020", format = "%B %d, %Y")
Various different locales are supported for parsing month and weekday names in different languages. To parse a French month:
date_parse( "juillet 10, 2021", format = "%B %d, %Y", locale = clock_locale("fr") )
You can learn about more locale options in the documentation for
If you have heterogeneous dates, you can supply multiple format strings:
x <- c("2020/1/5", "10-03-05", "2020/2/2") formats <- c("%Y/%m/%d", "%y-%m-%d") date_parse(x, format = formats)
You have three options when parsing date-times:
With strings like
"2020-01-01 01:02:03" where there is no time zone offset or full (not abbreviated!) time zone name, use
With strings like
"2020-01-01 01:02:03-05:00[America/New_York]" where there is both a time zone offset and time zone name present in the string, use
With strings like
"2020-01-01 01:02:03 EST" where there is a time zone abbreviation in the string, use
date_time_parse() requires a
zone argument, and will ignore any other zone information in the string (i.e. if you tried to specify
%Z). The default format string is
date_time_parse("2020-01-01 01:02:03", "America/New_York")
If you happen to parse an invalid or ambiguous date-time, you'll get an error. For example, on November 1st, 2020, there were two 1 o'clock hours in the America/New_York time zone due to a daylight saving time fallback. You can see that if we parse a time right before the fallback, and then shift it forward by 1 second, and then 1 hour and 1 second, respectively:
before <- date_time_parse("2020-11-01 00:59:59", "America/New_York") # First 1 o'clock before + 1 # Second 1 o'clock before + 1 + 3600
The following string doesn't include any information about which of these two 1 o'clocks it belongs to, so it is considered ambiguous. Ambiguous times will error when parsing:
date_time_parse("2020-11-01 01:30:00", "America/New_York")
To fix that, you can specify an ambiguous time resolution strategy with the
zone <- "America/New_York" date_time_parse("2020-11-01 01:30:00", zone, ambiguous = "earliest") date_time_parse("2020-11-01 01:30:00", zone, ambiguous = "latest")
date_time_parse_complete() doesn't have a
zone argument, and doesn't require
nonexistent arguments, since it assumes that the string you are providing is completely unambiguous. The only way this is possible is by having both a time zone offset, specified by
%z, and a full time zone name, specified by
%Z, in the string.
The following is an example of an "extended" ISO 8601 format used by Java 8's time library to specify complete date-time strings. This is something that
date_time_parse_complete() can parse. The default format string follows this extended format, and is
x <- "2020-01-01 01:02:03-05:00[America/New_York]" date_time_parse_complete(x)
date_time_parse_abbrev() is useful when your date-time strings contain a time zone abbreviation rather than a time zone offset or full time zone name.
x <- "2020-01-01 01:02:03 EST" date_time_parse_abbrev(x, "America/New_York")
The string is first parsed as a naive time without considering the abbreviation, and is then converted to a zoned-time using the supplied
zone. If an ambiguous time is parsed, the abbreviation is used to resolve the ambiguity.
x <- c( "1970-10-25 01:30:00 EDT", "1970-10-25 01:30:00 EST" ) date_time_parse_abbrev(x, "America/New_York")
You might be wondering why you need to supply
zone at all. Isn't the abbreviation enough? Unfortunately, multiple countries use the same time zone abbreviations, even though they have different time zones. This means that, in many cases, the abbreviation alone is ambiguous. For example, both India and Israel use
IST for their standard times.
x <- "1970-01-01 02:30:30 IST" # IST = India Standard Time date_time_parse_abbrev(x, "Asia/Kolkata") # IST = Israel Standard Time date_time_parse_abbrev(x, "Asia/Jerusalem")
When performing time-series related data analysis, you often need to summarize your series at a less precise precision. There are many different ways to do this, and the differences between them are subtle, but meaningful. clock offers three different sets of functions for summarization:
Grouping allows you to summarize a component of a date or date-time within other components. An example of this is grouping by day of the month, which summarizes the day component within the current year-month.
x <- seq(date_build(2019, 1, 20), date_build(2019, 2, 5), by = 1) x # Grouping by 5 days of the current month date_group(x, "day", n = 5)
The thing to note about grouping by day of the month is that at the end of each month, the groups restart. So this created groups for January of
[1, 5], [6, 10], [11, 15], [16, 20], [21, 25], [26, 30], .
You can also group by month or year:
This also works with date-times, adding the ability to group by hour of the day, minute of the hour, and second of the minute.
x <- seq( date_time_build(2019, 1, 1, 1, 55, zone = "UTC"), date_time_build(2019, 1, 1, 2, 15, zone = "UTC"), by = 120 ) x date_group(x, "minute", n = 5)
While grouping is useful for summarizing within a component, rounding is useful for summarizing across components. It is great for summarizing by, say, a rolling set of 60 days.
Rounding operates on the underlying count that makes up your date or date-time. To see what I mean by this, try unclassing a date:
unclass(date_build(2020, 1, 1))
This is a count of days since the origin that R uses, 1970-01-01, which is considered day 0. If you were to floor by 60 days, this would bundle
[1970-01-01, 1970-03-02), [1970-03-02, 1970-05-01), and so on. Equivalently, it bundles counts of
[0, 60), [60, 120), etc.
x <- seq(date_build(1970, 01, 01), date_build(1970, 05, 10), by = 20) date_floor(x, "day", n = 60) date_ceiling(x, "day", n = 60)
If you prefer a different origin, you can supply a Date
date_floor(), which determines what "day 0" is considered to be. This can be useful for grouping by multiple weeks if you want to control what is considered the start of the week. Since 1970-01-01 is a Thursday, flooring by 2 weeks would normally generate all Thursdays:
as_weekday(date_floor(x, "week", n = 14))
To change this you can supply an
origin on the weekday that you'd like to be considered the first day of the week.
sunday <- date_build(1970, 01, 04) date_floor(x, "week", n = 14, origin = sunday) as_weekday(date_floor(x, "week", n = 14, origin = sunday))
If you only need to floor by 1 week, it is often easier to use
date_shift(), as seen in the next section.
date_shift() allows you to target a weekday, and then shift a vector of dates forward or backward to the next instance of that target. It requires using one of the new types in clock, weekday, which is supplied as the target.
For example, to shift to the next Tuesday:
x <- date_build(2020, 1, 1:2) # Wednesday / Thursday as_weekday(x) # `clock_weekdays` is a helper that returns the code corresponding to # the requested day of the week clock_weekdays$tuesday tuesday <- weekday(clock_weekdays$tuesday) tuesday date_shift(x, target = tuesday)
Shifting to the previous day of the week is a nice way to floor by 1 week. It allows you to control the start of the week in a way that is slightly easier than using
date_floor(origin = ).
x <- seq(date_build(1970, 01, 01), date_build(1970, 01, "last"), by = 3) date_shift(x, tuesday, which = "previous")
You can do arithmetic with dates and date-times using the family of
add_*() functions. With dates, you can add years, months, and days. With date-times, you can additionally add hours, minutes, and seconds.
x <- date_build(2020, 1, 1) add_years(x, 1:5)
One of the neat parts about clock is that it requires you to be explicit about how you want to handle invalid dates when doing arithmetic. What is 1 month after January 31st? If you try and create this date, you'll get an error.
x <- date_build(2020, 1, 31) add_months(x, 1)
clock gives you the power to handle this through the
# The previous valid moment in time add_months(x, 1, invalid = "previous") # The next valid moment in time add_months(x, 1, invalid = "next") # Overflow the days. There were 29 days in February, 2020, but we # specified 31. So this overflows 2 days past day 29. add_months(x, 1, invalid = "overflow") # If you don't consider it to be a valid date add_months(x, 1, invalid = "NA")
As a teaser, the low level library has a calendar type named year-month-day that powers this operation. It actually gives you more flexibility, allowing
"2020-02-31" to exist in the wild:
ymd <- as_year_month_day(x) + duration_months(1) ymd
You can use
invalid_resolve(invalid =) to resolve this like you did in
add_months(), or you can let it hang around if you expect other operations to make it "valid" again.
# Adding 1 more month makes it valid again ymd + duration_months(1)
When working with date-times, you can additionally add hours, minutes, and seconds.
x <- date_time_build(2020, 1, 1, 2, 30, zone = "America/New_York") x %>% add_days(1) %>% add_hours(2:5)
When adding units of time to a POSIXct, you have to be very careful with daylight saving time issues. clock tries to help you out by letting you know when you run into an issue:
x <- date_time_build(1970, 04, 25, 02, 30, 00, zone = "America/New_York") x # Daylight saving time gap on the 26th between 01:59:59 -> 03:00:00 x %>% add_days(1)
You can solve this using the
nonexistent argument to control how these times should be handled.
# Roll forward to the next valid moment in time x %>% add_days(1, nonexistent = "roll-forward") # Roll backward to the previous valid moment in time x %>% add_days(1, nonexistent = "roll-backward") # Shift forward by adding the size of the DST gap # (this often keeps the time of day, # but doesn't guaratee that relative ordering in `x` is maintained # so I don't recommend it) x %>% add_days(1, nonexistent = "shift-forward") # Replace nonexistent times with an NA x %>% add_days(1, nonexistent = "NA")
clock provides a family of getters and setters for working with dates and date-times. You can get and set the year, month, or day of a date.
x <- date_build(2019, 5, 6) get_year(x) get_month(x) get_day(x) x %>% set_day(22) %>% set_month(10)
As you might expect by now, setting the date to an invalid date requires you to explicitly handle this:
x %>% set_day(31) %>% set_month(4) x %>% set_day(31) %>% set_month(4, invalid = "previous")
You can additionally set the hour, minute, and second of a POSIXct.
x <- date_time_build(2020, 1, 2, 3, zone = "America/New_York") x x %>% set_minute(5) %>% set_second(10)
As with other manipulations of POSIXct, you'll have to be aware of daylight saving time when setting components. You may need to supply the
ambiguous arguments of the
set_*() functions to handle these issues.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.