knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(sdtm.oak)
An SDTM DTC variable may include data that is represented in ISO
8601 format as a complete date/time, a
partial date/time, or an incomplete date/time. {sdtm.oak}
provides the
create_iso8601()
function that allows flexible mapping of date and time
values in various formats to a single date-time ISO 8601 format.
To perform conversion to the ISO 8601 format you need to pass two key arguments:
character
type;.format
parameter that instructs create_iso8601()
on which date/time components to expect.create_iso8601("2000 01 05", .format = "y m d") create_iso8601("22:35:05", .format = "H:M:S")
By default the .format
parameter understands a few reserved characters:
"y"
for year"m"
for month"d"
for day"H"
for hours"M"
for minutes"S"
for secondsBesides character vectors of dates and times, you may also pass a single vector of date-times, provided you adjust the format:
create_iso8601("2000-01-05 22:35:05", .format = "y-m-d H:M:S")
If you have dates and times in separate vectors then you will need to pass a format for each vector:
create_iso8601("2000-01-05", "22:35:05", .format = c("y-m-d", "H:M:S"))
In addition, like most R functions that take vectors as input,
create_iso8601()
is vectorized:
date <- c("2000-01-05", "2001-12-25", "1980-06-18", "1979-09-07") time <- c("00:12:21", "22:35:05", "03:00:15", "07:09:00") create_iso8601(date, time, .format = c("y-m-d", "H:M:S"))
But the number of elements in each of the inputs has to match or you will get an error:
date <- c("2000-01-05", "2001-12-25", "1980-06-18", "1979-09-07") time <- "00:12:21" try(create_iso8601(date, time, .format = c("y-m-d", "H:M:S")))
You can combine individual date and time components coming in as separate inputs; here is a contrived example of year, month and day together, hour, and minute:
year <- c("99", "84", "00", "80", "79", "1944", "1953") month_and_day <- c("jan 1", "apr 04", "mar 06", "jun 18", "sep 07", "sep 13", "sep 14") hour <- c("12", "13", "05", "23", "16", "16", "19") min <- c("0", "60", "59", "42", "44", "10", "13") create_iso8601(year, month_and_day, hour, min, .format = c("y", "m d", "H", "M"))
The .format
argument must be always named; otherwise, it will be treated as if
it were one of the inputs and interpreted as missing.
try(create_iso8601("2000-01-05", "y-m-d"))
The .format
parameter can easily accommodate variations in the format of the
inputs:
create_iso8601("2000-01-05", .format = "y-m-d") create_iso8601("2000 01 05", .format = "y m d") create_iso8601("2000/01/05", .format = "y/m/d")
Individual components may come in a different order, so adjust the format accordingly:
create_iso8601("2000 01 05", .format = "y m d") create_iso8601("05 01 2000", .format = "d m y") create_iso8601("01 05, 2000", .format = "m d, y")
All other individual characters given in the format are taken strictly, e.g. the number of spaces matters:
date <- c("2000 01 05", "2000 01 05", "2000 01 05", "2000 01 05") create_iso8601(date, .format = "y m d") create_iso8601(date, .format = "y m d") create_iso8601(date, .format = "y m d") create_iso8601(date, .format = "y m d")
The format can include regular expressions though:
create_iso8601(date, .format = "y\\s+m\\s+d")
By default, a streak of the reserved characters is treated as if only one was provided, so these formats are equivalent:
date <- c("2000-01-05", "2001-12-25", "1980-06-18", "1979-09-07") time <- c("00:12:21", "22:35:05", "03:00:15", "07:09:00") create_iso8601(date, time, .format = c("y-m-d", "H:M:S")) create_iso8601(date, time, .format = c("yyyy-mm-dd", "HH:MM:SS")) create_iso8601(date, time, .format = c("yyyyyyyy-m-dddddd", "H:MMMMM:SSSS"))
When an input vector contains values with varying formats, a single format may not be adequate to encompass all variations. In such situations, it's advisable to list multiple alternative formats. This approach ensures that each format is tried sequentially until one matches the data in the vector.
date <- c("2000/01/01", "2000-01-02", "2000 01 03", "2000/01/04") create_iso8601(date, .format = "y-m-d") create_iso8601(date, .format = "y m d") create_iso8601(date, .format = "y/m/d") create_iso8601(date, .format = list(c("y-m-d", "y m d", "y/m/d")))
Consider the order in which you supply the formats, as it can be significant. If multiple formats could potentially match, the sequence determines which format is applied first.
create_iso8601("07 04 2000", .format = list(c("d m y", "m d y"))) create_iso8601("07 04 2000", .format = list(c("m d y", "d m y")))
Note that if you are passing alternative formats, then the .format
argument
must be a list whose length matches the number of inputs.
By default, date or time components are parsed as follows:
# Years: two-digit or four-digit numbers. years <- c("0", "1", "00", "01", "15", "30", "50", "68", "69", "80", "99") create_iso8601(years, .format = "y") # Adjust the point where two-digits years are mapped to 2000's or 1900's. create_iso8601(years, .format = "y", .cutoff_2000 = 20L) # Both numeric months (two-digit only) and abbreviated months work out of the box months <- c("0", "00", "1", "01", "Jan", "jan") create_iso8601(months, .format = "m") # Month days: single or two-digit numbers, anything else results in NA. create_iso8601(c("1", "01", "001", "10", "20", "31"), .format = "d") # Hours create_iso8601(c("1", "01", "001", "10", "20", "31"), .format = "H") # Minutes create_iso8601(c("1", "01", "001", "10", "20", "60"), .format = "M") # Seconds create_iso8601(c("1", "01", "23.04", "001", "10", "20", "60"), .format = "S")
If date or time component values include special values, e.g. values
encoding missing values, then you can indicate those values as possible
alternatives such that the parsing will tolerate them; use the .na
argument:
create_iso8601("U DEC 2019 14:00", .format = "d m y H:M") create_iso8601("U DEC 2019 14:00", .format = "d m y H:M", .na = "U") create_iso8601("U UNK 2019 14:00", .format = "d m y H:M") create_iso8601("U UNK 2019 14:00", .format = "d m y H:M", .na = c("U", "UNK"))
In this case you could achieve the same result using regexps:
create_iso8601("U UNK 2019 14:00", .format = "(d|U) (m|UNK) y H:M")
There might be cases when the reserved characters --- "y"
, "m"
, "d"
,
"H"
, "M"
, "S"
--- might get in the way of specifying an adequate format.
For example, you might be tempted to use format "HHMM"
to try to parse a time
such as "14H00M"
. You could assume that the first "H" codes for parsing the
hour, and the second "H" to be a literal "H" but, actually, "HH"
will be taken
to mean parsing hours, and "MM"
to parse minutes. You can use the function
fmt_cmp()
to specify alternative format regexps for the format, replacing the
default characters.
In the next example, we reassign new format strings for the hour and minute
components, thus freeing the "H"
and "M"
patterns from being interpreted as
hours and minutes, and to be taken literally:
create_iso8601("14H00M", .format = "HHMM") create_iso8601("14H00M", .format = "xHwM", .fmt_c = fmt_cmp(hour = "x", min = "w"))
Note that you need to make sure that the format component regexps are mutually
exclusive, i.e. they don't have overlapping matches; otherwise
create_iso8601()
will fail with an error. In the next example both months and
minutes could be represented by an "m"
in the format resulting in an ambiguous
format specification.
fmt_cmp(hour = "h", min = "m") try(create_iso8601("14H00M", .format = "hHmM", .fmt_c = fmt_cmp(hour = "h", min = "m")))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.