timeid: Generate Integer-Id From Time/Date Sequences

View source: R/indexing.R

timeidR Documentation

Generate Integer-Id From Time/Date Sequences

Description

timeid groups time vectors in a way that preserves the temporal structure. It generate an integer id where unit steps represent the greatest common divisor in the original sequence e.g c(4, 6, 10) -> c(1, 2, 4) or c(0.25, 0.75, 1) -> c(1, 3, 4).

Usage

timeid(x, factor = FALSE, ordered = factor, extra = FALSE)

Arguments

x

a numeric time object such as a Date, POSIXct or other integer or double vector representing time.

factor

logical. TRUE returns an (ordered) factor with levels corresponding to the full sequence (without irregular gaps) of time. This is useful for inclusion in the index but might be computationally expensive for long sequences, see Details. FALSE returns a simpler object of class 'qG'.

ordered

logical. TRUE adds a class 'ordered'.

extra

logical. TRUE attaches a set of 4 diagnostic items as attributes to the result:

  • "unique_ints": unique(unattrib(timeid(x))) - the unique integer time steps in first-appearance order. This can be useful to check the size of gaps in the sequence.

  • "sort_unique_x": sort(unique(x)).

  • "range_x": range(x).

  • "step_x": vgcd(sort(unique(diff(sort(unique(x)))))) - the greatest common divisor.

Note that returning these attributes does not incur additional computations.

Details

Let range_x and step_x be the like-named attributes returned when extra = TRUE, then, if factor = TRUE, a complete sequence of levels is generated as seq(range_x[1], range_x[2], by = step_x) |> copyMostAttrib(x) |> as.character(). If factor = FALSE, the number of timesteps recorded in the "N.groups" attribute is computed as (range_x[2]-range_x[1])/step_x + 1, which is equal to the number of factor levels. In both cases the underlying integer id is the same and preserves gaps in time. Large gaps (strong irregularity) can result in many unused factor levels, the generation of which can become expensive. Using factor = FALSE (the default) is thus more efficient.

Value

A factor or 'qG' object, optionally with additional attributes attached.

See Also

seqid, Indexing, Time Series and Panel Series, Collapse Overview

Examples

oldopts <- options(max.print = 30)

# A normal use case
timeid(wlddev$decade)
timeid(wlddev$decade, factor = TRUE)
timeid(wlddev$decade, extra = TRUE)

# Here a large number of levels is generated, which is expensive
timeid(wlddev$date, factor = TRUE)
tid <- timeid(wlddev$date, extra = TRUE) # Much faster
str(tid)

# The reason for step = 1 are leap years with 366 days every 4 years
diff(attr(tid, "unique"))

# So in this case simple factor generation gives a better result
qF(wlddev$date, ordered = TRUE, na.exclude = FALSE)

# The best way to deal with this data would be to convert it
# to zoo::yearmon and then use timeid:
timeid(zoo::as.yearmon(wlddev$date), factor = TRUE, extra = TRUE)

options(oldopts)
rm(oldopts, tid)

SebKrantz/collapse documentation built on Dec. 16, 2024, 7:26 p.m.