match_ts_nearest_by_key: Match time series by key and time

View source: R/match_ts.R

match_ts_nearest_by_keyR Documentation

Match time series by key and time

Description

For two dataframes, d1 and d2, this function finds the positions in the second dataframe which, for each key (e.g., factor level) in the first dataframe, are nearest in time (i.e., nearest neighbour interpolation accounting for observations from different factor levels).

Usage

match_ts_nearest_by_key(d1, d2, key_col, time_col)

Arguments

d1

A dataframe which includes a column that defines factor levels and a column that defines time stamps. The names of these columns need to match those in d2.

d2

A dataframe which includes a column that defines factor levels and a column that defines time stamps. The names of these columns need to match those in d1.

key_col

A character that defines the column name in d1 and d2 that distinguishes factor levels.

time_col

A character that defines the column name in d1 and d2 that defines time stamps.

Details

If there are multiple matches, only the first is returned.

Value

For a dataframe comprising observations from a series of factor levels (e.g., individuals) collected through time, the function returns a vector of positions in a second dataframe which, for the appropriate factor level, are nearest in time.

Author(s)

Edward Lavender

See Also

This is an extension of match_ts_nearest to account for different factor levels when these to be included in the matching process. To use match_ts_nearest or match_ts_nearest_by_key to add observations from one dataframe to another, see pair_ts.

Examples

#### Example (1)
# Imagine we have observations from two keys (e.g., individuals) in two dataframes
# We want to add observations from the second dataframe into the first dataframe.
# Accounting for keys, the observations nearest in time in d2 for each row in d1 are
# ... 1, 2, 4, 4
d1 <- data.frame(t = as.POSIXct(c("2016-01-01 12:00:00",
                                  "2016-01-01 15:00:00",
                                  "2016-01-01 17:00:00",
                                  "2016-01-01 16:00:00")),
                 key = c(1, 1, 2, 2))
d2 <- data.frame(t = as.POSIXct(c("2016-01-01 13:00:00",
                                  "2016-01-01 14:00:00",
                                  "2016-01-01 12:00:00",
                                  "2016-01-01 15:00:00")),
                 key = c(1, 1, 2, 2))
match_ts_nearest_by_key(d1, d2, key_col = "key", time_col = "t")

#### Example (2)
# Define dataframes
d1 <- data.frame(t = as.POSIXct(c("2016-01-01 18:00:00",
                                  "2016-01-01 17:00:00",
                                  "2016-01-01 13:00:00",
                                  "2016-01-01 14:00:00",
                                  "2016-01-01 17:00:00",
                                  "2016-01-01 21:00:00")),
                 key = c(2, 2, 2, 1, 1, 3))
d2 <- data.frame(t = as.POSIXct(c("2016-01-01 21:00:00",
                                  "2016-01-01 14:00:00",
                                  "2016-01-01 18:00:00",
                                  "2016-01-01 17:00:00",
                                  "2016-01-01 22:00:00",
                                  "2016-01-01 20:00:00",
                                  "2016-01-01 13:00:00",
                                  "2016-01-01 17:00:00",
                                  "2016-01-01 16:00:00")),
                 key = c(2, 2, 2, 2, 2, 3, 3, 1, 1),
                 vals = stats::runif(9, 0, 1))
# Add the to the dataframe
d1$position_in_d2 <- match_ts_nearest_by_key(d1, d2, key_col = "key", time_col = "t")
# Show that the index adds the correct key
d1$key_in_d2 <- d2$key[d1$position_in_d2]
# Show that the index adds the correct time stamp for that key
d1$t_in_d2 <- d2$t[d1$position_in_d2]
# We can now safely add values from d2 to d1:
d1$val_in_d2 <- d2$vals[d1$position_in_d2]
# Examine d1 and d2:
d1; d2


edwardlavender/Tools4ETS documentation built on Nov. 29, 2022, 7:41 a.m.