thin_ts: Thin a time series by selecting every nth observation

View source: R/thin_ts.R

thin_tsR Documentation

Thin a time series by selecting every nth observation

Description

This function thins a time series by selecting every nth observation. The function provides flexibility to (a) distinguish among independent time series (e.g. time series for different individuals) via a user-supplied factor that can be determined flag_ts and (b) return multiple thinned datasets for which thinning is started at different locations. The latter is useful for models of thinned time series because it is important to check that model inferences are not sensitive to the particular subset of data chosen.

Usage

thin_ts(dat, ind = NULL, flag1, first = 1, nth)

Arguments

dat

A dataframe to be thinned.

ind

A character input which defines the column name dat which uniquely distinguishes each independent time series (i.e., flag3 reported by flag_ts).

flag1

A character input which defines the column name in dat which contains a logical vector with TRUE marking the first observation in each independent segment of time series (i.e. flag1 reported by flag_ts).

first

A number or numeric vector which defines the position(s) at which thinning is initiated for each independent time series. If a single number is supplied, the function returns a thinned dataframe. If a vector of numbers is supplied, the function returns a list, of the same length, in which each element is a thinned dataframe comprising a different thinned dataframe - one with the same degree of thinning but in which thinning was initiated at a different position. The order of elements in the resultant list is the same as the order of elements in first.

nth

A number which defines the degree of thinning (i.e. the selection of every nth row).

Value

The function returns a list or dataframe, depending on the input to first (see above).

Author(s)

Edward Lavender

Examples

#### Simulate a dataframe to be thinned
# Define time stamps
t <- c(seq.POSIXt(as.POSIXct("2016-01-01"), as.POSIXct("2016-01-02"), by = "6 hours"),
       seq.POSIXt(as.POSIXct("2016-01-02 18:00:00"), as.POSIXct("2016-01-04"), by = "6 hours")
)
# Apply flag_ts() function to flag independent time series
dat <- cbind(t, flag_ts(t, duration_threshold = 6*60, flag = 1:3))
nrow(dat)

#### Example (1): Thin a single time series by selecting every nth position
# Thin the time series
dat_thin <- thin_ts(dat = dat,
                    nth = 2,
                    flag1 = "flag1"
)
# Examine the rows retained:
dat$row_retained <- dat$t %in% dat_thin$t
dat

#### Example (2): Thin multiple independent time series via the'ind' argument
# Here, we now account for the fact that the data consists of multiple (two) independent time series
# ... as identified by flag_ts(), and the selection of positions is identical for both time series
# ... (i.e. the first observation in each time series)
dat$row_retained <- NULL
dat_thin <- thin_ts(dat = dat,
                    nth = 2,
                    flag1 = "flag1",
                    ind = "flag3")
dat$row_retained <- dat$t %in% dat_thin$t
dat

#### Example (3): Multiple thinned datasets can be produced by supplying multiple values to 'first'
# This is useful because it is important to check that the exact thinned sample does not affect
# ... results (e.g. if thinned data are used in modelling)
dat$row_retained <- NULL
dat_thin_ls <- thin_ts(dat = dat,
                       nth = 2,
                       flag1 = "flag1",
                       ind = "flag3",
                       first = c(1, 2))
# With multiple 'first' values, the function returns a list, with a thinned dataframe for each
# ... first value:
utils::str(dat_thin_ls)
# Examine the difference:
dat$row_retained1 <- dat$t %in% dat_thin_ls[[1]]$t
dat$row_retained2 <- dat$t %in% dat_thin_ls[[2]]$t
dat


edwardlavender/Tools4ETS documentation built on Nov. 29, 2022, 7:41 a.m.