Description Usage Arguments Details Value Examples
Special resampling strategy for K-fold cross-validation on time series data with stratification by target variable.
1 2 | cv_split_temporal(data, y, id, time, nfolds = 5L, probs = seq(0, 1,
length.out = 11))
|
data |
data.table with |
y |
Target variable name (character). |
id |
Identifier of each time series (character). |
time |
Time variable name (character). |
nfolds |
Number of folds (min 2, max 20). |
probs |
Numeric vector of probabilities for quantile binning with values in [0, 1] range. |
Numeric target: quantile binning is used for stratification.
Character/categorical target: resampling performs within categories.
probs
can be a vector like c(0, seq(0.99, 1, length.out = 10))
for target with very skewed distribution, e.g. for financial data with 99% of 0's.
When some observations from one time series fall into validation fold, train/validation indices for this time series will be reassigned: only last observation will be in validation fold. This ensures that training performs on past data and predictions are made for future observations.
TODO: allow to specify arbitrary number of observations for validation set.
data.table with nfolds
columns. Each column is an indicator variable
with 1 corresponds to observations in validation dataset (stratified by target).
1 2 3 4 5 6 7 | dt <- data.table(
user = rep(1:100, each = 5),
date = as.POSIXct(rep(seq(1.8*10e8, 1.8*10e8 + 388800, by = 86400), 100),
origin = "1960-01-01"),
target = rnorm(5e2)
)
cv_split_temporal(dt, "target", "user", "date")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.