README.md

Introduction

An Unevenly-spaced Time Series (uts) is a sequence of observation time and value pairs (tn, Xn) with strictly increasing observation times. As opposed to equally spaced time series, the spacing of observation times may not be constant.

As of early 2018, to the best of my knowledge, there is no R time series package that allows to write 100% of the application logic in terms of this definition. Either directly or indirectly, existing implementations fall back on equally spaced data for some of their functionality. For example, the window width of a rolling time series operator, such as a moving average, is usually specified in terms of the number of observations (e.g. 7 observations) instead of a temporal duration (e.g. 6 hours).

Even when a time series is equally spaced, it is often preferable to define operations using a temporal duration (e.g. a moving average over the past year) instead of number of observations (e.g. a moving average over the last 12 observations values). Should the frequency of the time series change, then in the former case the code would not require any changes, while it would in the later case. Moreover, in this way an identical analysis can be carried out on multiple time series of different frequencies, without having to keep track of the individual observation frequencies.

I therefore decided to design a new time series package, partially based on my research on this topic.

Advantages

Prerequisites

On Windows, install Rtools, which is needed to be able to compile packages from source.

On Linux, the exact prerequisites depend on the distribution used. For example, on Debian and its derivates:

sudo apt install r-base-dev libxml2-dev

Installation

This package is not yet available on CRAN, but can be installled from GitHub, either using the R package devtools:

install.packages("devtools")
devtools::install_github("andreas50/uts", build_vignettes=TRUE)

or using the R package remotes:

install.packages("remotes")
remotes::install_github("andreas50/uts")

Sample Code

# Get sample unevenly-spaced time series with six observations
x <- ex_uts()
x
#> 2007-11-08 07:00:00 2007-11-08 08:01:00 2007-11-08 13:15:00 2007-11-09 07:30:00 2007-11-09 08:51:00 
#>              48.375              48.500              48.375              47.000              47.500 
#> 2007-11-09 15:15:00 
#>              47.350
# Plot the time series
plot(x, type="o", cex.axis=0.8)

# Get first and last observation value(!)
first(x); last(x)
#> [1] 48.375
#> [1] 47.35

# Get first and last observation time(!)
start(x); end(x)
#> [1] "2007-11-08 07:00:00 EST"
#> [1] "2007-11-09 15:15:00 EST"

# Get time series length, first in terms of number of observations, second in terms of temporal length
length(x); length_t(x)
#> [1] 6
#> [1] "116100s (~1.34 days)"

# Insert new observation
x[as.POSIXct("2007-11-10 10:00:00")] <- 45

# Sample the time series at a specific time point, using one of several supported interpolation methods
sample_values(x, as.POSIXct("2007-11-10"), interpolation="linear")
#> [1] 46.25333

# Shift observation times by 3 hours
lag_t(x, dhours(3))
#> 2007-11-08 10:00:00 2007-11-08 11:01:00 2007-11-08 16:15:00 2007-11-09 10:30:00 2007-11-09 11:51:00 
#>              48.375              48.500              48.375              47.000              47.500 
#> 2007-11-09 18:15:00 2007-11-10 13:00:00 
#>              47.350              45.000

# Get maximum observation value and the corresponding time
max(x); which.max(x)
#> [1] 48.5
#> [1] "2007-11-08 08:01:00 EST"

# Time series arithmetic
x*2 + 5
#> 2007-11-08 07:00:00 2007-11-08 08:01:00 2007-11-08 13:15:00 2007-11-09 07:30:00 2007-11-09 08:51:00 
#>              101.75              102.00              101.75               99.00              100.00 
#> 2007-11-09 15:15:00 2007-11-10 10:00:00 
#>               99.70               95.00

# Convert time series to data.frame
as.data.frame(x)
#>                  time  value
#> 1 2007-11-08 07:00:00 48.375
#> 2 2007-11-08 08:01:00 48.500
#> 3 2007-11-08 13:15:00 48.375
#> 4 2007-11-09 07:30:00 47.000
#> 5 2007-11-09 08:51:00 47.500
#> 6 2007-11-09 15:15:00 47.350
#> 7 2007-11-10 10:00:00 45.000

# Get "tail" of time series, first in terms of number of observations, second in terms of temporal length
tail(x, 3)
#> 2007-11-09 08:51:00 2007-11-09 15:15:00 2007-11-10 10:00:00 
#>               47.50               47.35               45.00
tail_t(x, ddays(1))
#> 2007-11-09 15:15:00 2007-11-10 10:00:00 
#>               47.35               45.00

Comparison with other packages for irregular time series



andreas50/uts documentation built on April 8, 2021, 10:03 a.m.