Parse POSIXct objects from input data

Share:

Description

These function use the Boost Date_Time library to parse datetimes (and dates) from strings, integers, factors or even numeric values (which are cast to strings internally). They return a vector of POSIXct objects (or Date objects in the case of anydate). POSIXct objects represent dates and time as (possibly fractional) seconds since the ‘epoch’ of January 1, 1970. A timezone can be set, if none is supplied ‘UTC’ is set.

Usage

1
2
3
4
5
6
7
anytime(x, tz = getTZ(), asUTC = FALSE)

anydate(x, tz = getTZ(), asUTC = FALSE)

utctime(x, tz = getTZ())

utcdate(x, tz = getTZ())

Arguments

x

A vector of type character, integer or numeric with date(time) expressions to be parsed and converted.

tz

A string with the timezone, defaults to the result of the (internal) getTZ function if unset. The getTZ function returns the timezone values stored in local package environment, and set at package load time. Also note that this argument applies to the output: the returned object will have this timezone set. The timezone is not used for the parsing which will always be to localtime, or to UTC is the asUTC variable is set (as it is in the related functions link{utctime} amd utcdate). So one can think of the argument as ‘shift parsed time object to this timezone’. This is similar to what format() in base R does, but our return value is still a POSIXt object instead of a character value.

asUTC

A logical value indicating if parsing should be to UTC; default is false implying localtime.

Details

A number of fixed formats are tried in succession. These include the standard ISO format ‘YYYY-MM-DD HH:MM:SS’ as well as different local variants including several forms popular in the United States. Two-digits years and clearly ambigous formats such as ‘03/04/05’ are ignored. In the case of parsing failure a NA value is returned.

Fractional seconds are supported as well. As R itself only supports microseconds, the Boost compile-time option for nano-second resolution has not been enabled.

Value

A vector of POSIXct elements, or, in the case of anydate, a vector of Date objects.

Notes

By default, the (internal) conversion to (fractional) seconds since the epoch is relative to the locatime of this system, and therefore not completely independent of the settings of the local system. This is to strike a balance between ease of use and functionality. A more-full featured conversion could be possibly be added with support for arbitrary reference times, but this is (at least) currently outside the scope of this package. See the RcppCCTZ package which offers some timezone-shifting and differencing functionality. As of version 0.0.5 one can also parse relative to UTC avoiding the localtime issue,

Times and timezones can be tricky. This package offers a heuristic approach, it is likely that some input formats may not be parsed, or worse, be parsed incorrectly. This is not quite a Bobby Tables situation but care must always be taken with user-supplied input.

The Boost Date_Time library cannot parse single digit months or days. So while ‘2016/09/02’ works (as expected), ‘2016/9/2’ will not. Other non-standard formats may also fail.

The is a known issue (discussed at length in issue ticket 5) where Australian times are off by an hour. This seems to affect only Windows, not Linux.

When given a vector, R will coerce it to the type of the first element. Should that be NA, surprising things can happen: c(NA, Sys.Date()) forces both values to numeric and the date will not be parsed correctly (as its integer value becomes numeric before our code sees it). On the other hand, c(Sys.Date(), NA) works as expected parsing as type Date with one missing value. See issue ticket 11) for more.

Operating System Impact

On Windows systems, accessing the isdst flag on dates or times before January 1, 1970, can lead to a crash. Therefore, the lookup of this value has been disabled for those dates and times, which could therefore be off by an hour (the common value that needs to be corrected). It should not affect dates, but may affect datetime objects.

Author(s)

Dirk Eddelbuettel

References

This StackOverflow answer provided the initial idea: http://stackoverflow.com/a/3787188/143305.

See Also

anytime-package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## See the source code for a full list of formats, and the
## or the reference in help('anytime-package') for details
times <- c("2004-03-21 12:45:33.123456",
          "2004/03/21 12:45:33.123456",
          "20040321 124533.123456",
          "03/21/2004 12:45:33.123456",
          "03-21-2004 12:45:33.123456",
          "2004-03-21",
          "20040321",
          "03/21/2004",
          "03-21-2004",
          "20010101")
anytime(times)
anydate(times)
utctime(times)
utcdate(times)

## show effect of tz argument
anytime("2001-02-03 04:05:06")
## adjust parsed time to given TZ argument
anytime("2001-02-03 04:05:06", tz="America/Los_Angeles")
## somewhat equvalent base R functionality
format(anytime("2001-02-03 04:05:06"), tz="America/Los_Angeles")