date-time-parse: Parsing: date-time

Description Usage Arguments Details Value Examples

Description

There are three parsers for parsing strings into POSIXct date-times, date_time_parse(), date_time_parse_complete(), and date_time_parse_abbrev().

date_time_parse()

date_time_parse() is useful for strings like "2019-01-01 00:00:00", where the UTC offset and full time zone name are not present in the string. The string is first parsed as a naive-time without any time zone assumptions, and is then converted to a POSIXct with the supplied zone.

Because converting from naive-time to POSIXct may result in nonexistent or ambiguous times due to daylight saving time, these must be resolved explicitly with the nonexistent and ambiguous arguments.

date_time_parse() completely ignores the %z and %Z commands. The only time zone specific information that is used is the zone.

The default format used is "%Y-%m-%d %H:%M:%S".

date_time_parse_complete()

date_time_parse_complete() is a parser for complete date-time strings, like "2019-01-01 00:00:00-05:00[America/New_York]". A complete date-time string has both the time zone offset and full time zone name in the string, which is the only way for the string itself to contain all of the information required to construct a zoned-time. Because of this, date_time_parse_complete() requires both the %z and %Z commands to be supplied in the format string.

The default format used is "%Y-%m-%d %H:%M:%S%Ez[%Z]".

date_time_parse_abbrev()

date_time_parse_abbrev() is a parser for date-time strings containing only a time zone abbreviation, like "2019-01-01 00:00:00 EST". The time zone abbreviation is not enough to identify the full time zone name that the date-time belongs to, so the full time zone name must be supplied as the zone argument. However, the time zone abbreviation can help with resolving ambiguity around daylight saving time fallbacks.

For date_time_parse_abbrev(), %Z must be supplied and is interpreted as the time zone abbreviation rather than the full time zone name.

If used, the %z command must parse correctly, but its value will be completely ignored.

The default format used is "%Y-%m-%d %H:%M:%S %Z".

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
date_time_parse(
  x,
  zone,
  ...,
  format = NULL,
  locale = clock_locale(),
  nonexistent = NULL,
  ambiguous = NULL
)

date_time_parse_complete(x, ..., format = NULL, locale = clock_locale())

date_time_parse_abbrev(x, zone, ..., format = NULL, locale = clock_locale())

Arguments

x

[character]

A character vector to parse.

zone

[character(1)]

A full time zone name.

...

These dots are for future extensions and must be empty.

format

[character / NULL]

A format string. A combination of the following commands, or NULL, in which case a default format string is used.

A vector of multiple format strings can be supplied. They will be tried in the order they are provided.

Year

  • %C: The century as a decimal number. The modified command %NC where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %y: The last two decimal digits of the year. If the century is not otherwise specified (e.g. with %C), values in the range [69 - 99] are presumed to refer to the years [1969 - 1999], and values in the range [00 - 68] are presumed to refer to the years [2000 - 2068]. The modified command %Ny, where N is a positive decimal integer, specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %Y: The year as a decimal number. The modified command %NY where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 4. Leading zeroes are permitted but not required.

Month

  • %b, %B, %h: The locale's full or abbreviated case-insensitive month name.

  • %m: The month as a decimal number. January is 1. The modified command %Nm where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

Day

  • %d, %e: The day of the month as a decimal number. The modified command %Nd where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

Day of the week

  • %a, %A: The locale's full or abbreviated case-insensitive weekday name.

  • %w: The weekday as a decimal number (0-6), where Sunday is 0. The modified command %Nw where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 1. Leading zeroes are permitted but not required.

ISO 8601 week-based year

  • %g: The last two decimal digits of the ISO week-based year. The modified command %Ng where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %G: The ISO week-based year as a decimal number. The modified command %NG where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 4. Leading zeroes are permitted but not required.

  • %V: The ISO week-based week number as a decimal number. The modified command %NV where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %u: The ISO weekday as a decimal number (1-7), where Monday is 1. The modified command %Nu where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 1. Leading zeroes are permitted but not required.

Week of the year

  • %U: The week number of the year as a decimal number. The first Sunday of the year is the first day of week 01. Days of the same year prior to that are in week 00. The modified command %NU where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %W: The week number of the year as a decimal number. The first Monday of the year is the first day of week 01. Days of the same year prior to that are in week 00. The modified command %NW where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

Day of the year

  • %j: The day of the year as a decimal number. January 1 is 1. The modified command %Nj where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 3. Leading zeroes are permitted but not required.

Date

  • %D, %x: Equivalent to %m/%d/%y.

  • %F: Equivalent to %Y-%m-%d. If modified with a width (like %NF), the width is applied to only %Y.

Time of day

  • %H: The hour (24-hour clock) as a decimal number. The modified command %NH where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %I: The hour (12-hour clock) as a decimal number. The modified command %NI where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %M: The minutes as a decimal number. The modified command %NM where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is 2. Leading zeroes are permitted but not required.

  • %S: The seconds as a decimal number. The modified command %NS where N is a positive decimal integer specifies the maximum number of characters to read. If not specified, the default is determined by the precision that you are parsing at. If encountered, the locale determines the decimal point character. Leading zeroes are permitted but not required.

  • %p: The locale's equivalent of the AM/PM designations associated with a 12-hour clock. The command %I must precede %p in the format string.

  • %R: Equivalent to %H:%M.

  • %T, %X: Equivalent to %H:%M:%S.

  • %r: Equivalent to %I:%M:%S %p.

Time zone

  • %z: The offset from UTC in the format [+|-]hh[mm]. For example -0430 refers to 4 hours 30 minutes behind UTC. And 04 refers to 4 hours ahead of UTC. The modified command %Ez parses a : between the hours and minutes and leading zeroes on the hour field are optional: [+|-]h[h][:mm]. For example -04:30 refers to 4 hours 30 minutes behind UTC. And 4 refers to 4 hours ahead of UTC.

  • %Z: The full time zone name or the time zone abbreviation, depending on the function being used. A single word is parsed. This word can only contain characters that are alphanumeric, or one of '_', '/', '-' or '+'.

Miscellaneous

  • %c: A date and time representation. Equivalent to %a %b %d %H:%M:%S %Y.

  • %%: A % character.

  • %n: Matches one white space character. %n, %t, and a space can be combined to match a wide range of white-space patterns. For example "%n " matches one or more white space characters, and "%n%t%t" matches one to three white space characters.

  • %t: Matches zero or one white space characters.

locale

[clock_locale]

A locale object created from clock_locale().

nonexistent

[character / NULL]

One of the following nonexistent time resolution strategies, allowed to be either length 1, or the same length as the input:

  • "roll-forward": The next valid instant in time.

  • "roll-backward": The previous valid instant in time.

  • "shift-forward": Shift the nonexistent time forward by the size of the daylight saving time gap.

  • "shift-backward: Shift the nonexistent time backward by the size of the daylight saving time gap.

  • "NA": Replace nonexistent times with NA.

  • "error": Error on nonexistent times.

Using either "roll-forward" or "roll-backward" is generally recommended over shifting, as these two strategies maintain the relative ordering between elements of the input.

If NULL, defaults to "error".

If getOption("clock.strict") is TRUE, nonexistent must be supplied and cannot be NULL. This is a convenient way to make production code robust to nonexistent times.

ambiguous

[character / zoned_time / POSIXct / list(2) / NULL]

One of the following ambiguous time resolution strategies, allowed to be either length 1, or the same length as the input:

  • "earliest": Of the two possible times, choose the earliest one.

  • "latest": Of the two possible times, choose the latest one.

  • "NA": Replace ambiguous times with NA.

  • "error": Error on ambiguous times.

Alternatively, ambiguous is allowed to be a zoned_time (or POSIXct) that is either length 1, or the same length as the input. If an ambiguous time is encountered, the zoned_time is consulted. If the zoned_time corresponds to a naive_time that is also ambiguous and uses the same daylight saving time transition point as the original ambiguous time, then the offset of the zoned_time is used to resolve the ambiguity. If the ambiguity cannot be resolved by consulting the zoned_time, then this method falls back to NULL.

Finally, ambiguous is allowed to be a list of size 2, where the first element of the list is a zoned_time (as described above), and the second element of the list is an ambiguous time resolution strategy to use when the ambiguous time cannot be resolved by consulting the zoned_time. Specifying a zoned_time on its own is identical to list(<zoned_time>, NULL).

If NULL, defaults to "error".

If getOption("clock.strict") is TRUE, ambiguous must be supplied and cannot be NULL. Additionally, ambiguous cannot be specified as a zoned_time on its own, as this implies NULL for ambiguous times that the zoned_time cannot resolve. Instead, it must be specified as a list alongside an ambiguous time resolution strategy as described above. This is a convenient way to make production code robust to ambiguous times.

Details

If date_time_parse_complete() is given input that is length zero, all NAs, or completely fails to parse, then no time zone will be able to be determined. In that case, the result will use "UTC".

If manually parsing sub-second components, be aware that they will be automatically rounded to the nearest second when converting them to POSIXct. See the examples for a way to control this.

Value

A POSIXct.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Parse with a known `zone`, even though that information isn't in the string
date_time_parse("2020-01-01 05:06:07", "America/New_York")

# Same time as above, except this is a completely unambiguous parse that
# doesn't require a `zone` argument, because the zone name and offset are
# both present in the string
date_time_parse_complete("2020-01-01 05:06:07-05:00[America/New_York]")

# Only day components
date_time_parse("2020-01-01", "America/New_York", format = "%Y-%m-%d")

# `date_time_parse()` may have issues with ambiguous times due to daylight
# saving time fallbacks. For example, there were two 1'oclock hours here:
x <- date_time_parse("1970-10-25 00:59:59", "America/New_York")

# First (earliest) 1'oclock hour
add_seconds(x, 1)
# Second (latest) 1'oclock hour
add_seconds(x, 3601)

# If you try to parse this ambiguous time directly, you'll get an error:
ambiguous_time <- "1970-10-25 01:00:00"
try(date_time_parse(ambiguous_time, "America/New_York"))

# Resolve it by specifying whether you'd like to use the
# `earliest` or `latest` of the two possible times
date_time_parse(ambiguous_time, "America/New_York", ambiguous = "earliest")
date_time_parse(ambiguous_time, "America/New_York", ambiguous = "latest")

# `date_time_parse_complete()` doesn't have these issues, as it requires
# that the offset and zone name are both in the string, which resolves
# the ambiguity
complete_times <- c(
  "1970-10-25 01:00:00-04:00[America/New_York]",
  "1970-10-25 01:00:00-05:00[America/New_York]"
)
date_time_parse_complete(complete_times)

# `date_time_parse_abbrev()` also doesn't have these issues, since it
# uses the time zone abbreviation name to resolve the ambiguity
abbrev_times <- c(
  "1970-10-25 01:00:00 EDT",
  "1970-10-25 01:00:00 EST"
)
date_time_parse_abbrev(abbrev_times, "America/New_York")

# ---------------------------------------------------------------------------
# Rounding of sub-second components

# Generally, if you have a string with sub-second components, they will
# be ignored when parsing into a date-time
x <- c("2019-01-01 00:00:01.1", "2019-01-01 00:00:01.7")

date_time_parse(x, "America/New_York")

# If you manually try and parse those sub-second components with `%4S` to
# read the 2 seconds, 1 decimal point, and 1 fractional component, the
# fractional component will be rounded to the nearest second
date_time_parse(x, "America/New_York", format = "%Y-%m-%d %H:%M:%4S")

# If you don't like this, parse the full string as a naive-time,
# then round manually and convert to a POSIXct
nt <- naive_time_parse(x, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond")
nt

nt <- time_point_floor(nt, "second")
nt

as.POSIXct(nt, "America/New_York")

clock documentation built on April 23, 2021, 1:06 a.m.