nd_kf_xval: Non-dependent cross-validation

Description Usage Arguments Value

View source: R/methods.R

Description

Performs a cross-validation experiment where folds can be allocated in different ways considering time and/or space and a certain buffer around the testing set time and/or space is removed from the training set.

Usage

1
2
3
4
nd_kf_xval(data, nfolds, FUN, form, fold.alloc.proc = "Trand_SPrand",
  alloc.pars = NULL, t.buffer = NULL, s.buffer = NULL,
  s.dists = NULL, t.dists = NULL, time = "time", site_id = "site",
  .keepTrain = TRUE, ...)

Arguments

data

full dataset

nfolds

number of folds for the data set to be separated into.
If you would like to set the number of time and space folds separately, nfolds should be set to NULL and t.nfolds and sp.nfolds should be fed as a list to alloc.pars (only available when using fold.alloc.proc set to Tblock_SPchecker, Tblock_SPcontig or Tblock_SPrand).

FUN

function with arguments

  • train training set

  • test testing set

  • time column name of time-stamps

  • site_id column name of location identifiers

  • form a formula for model learning

  • ... other arguments

form

a formula for model learning

fold.alloc.proc

name of fold allocation function. Should be one of

  • Trand_SPrand – each fold contains completely random observations. The default

  • Tall_SPcontig - each fold includes all time and a contiguous block of space

  • Tall_SPrand - each fold includes all time and random locations in space

  • Tblock_SPrand - each fold includes a block of contiguous time for a randomly assigned part of space

  • Tblock_SPall - each fold includes a block of contiguous time for all locations

alloc.pars

parameters to pass onto fold.alloc.proc

t.buffer

numeric value with the distance of the temporal buffer between training and test sets. For each instance in the test set, instances that have a temporal distance of t.buffer or less at the same point in space are removed from the training set.

s.buffer

numeric value with the maximum distance of the spatial buffer between training and test sets. For each instance in the test set, instances that have a spatial distance of s.buffer or less at the same point in time are removed from the training set.

s.dists

a matrix of the distances between the spatial IDs in data. The column names and row names should be of type "SITE_<site_id>"

t.dists

a matrix of the distances between the time-stamps in data. The column names and row names should be of type "TIME_<time>"

time

column name of time-stamp in data. Default is "time"

site_id

column name of location identifier in data. Default is "site_id"

.keepTrain

if TRUE (default), instead of the results of FUN being directly returned, a list is created with both the results and a data.frame with the time and site identifiers of the observations used in the training step.

...

other arguments to FUN

Value

If keepTrain is TRUE, a list where each slot corresponds to one repetition or fold, containing a list with slots results containing the results of FUN, and train containing a data.frame with the time and site_id identifiers of the observations used in the training step. Usually, the results of FUN is a data.frame with location identifier site_id, time-stamp time, true values trues and the workflow's predictions preds.


mrfoliveira/Evaluation-procedures-for-forecasting-with-spatio-temporal-data documentation built on April 11, 2021, 10:50 a.m.