kf_xval: Cross-validation

Description Usage Arguments Value

View source: R/methods.R

Description

Performs a cross-validation experiment where folds can be allocated in different ways considering time and/or space

Usage

1
2
3
kf_xval(data, nfolds, FUN, form, fold.alloc.proc = "Trand_SPrand",
  alloc.pars = NULL, time = "time", site_id = "site",
  .keepTrain = TRUE, ...)

Arguments

data

full dataset

nfolds

number of folds for the data set to be separated into.
If you would like to set the number of time and space folds separately, nfolds should be set to NULL and t.nfolds and sp.nfolds should be fed as a list to alloc.pars (only available when using fold.alloc.proc set to Tblock_SPchecker, Tblock_SPcontig or Tblock_SPrand).

FUN

function with arguments

  • train training set

  • test testing set

  • time column name of time-stamps

  • site_id column name of location identifiers

  • form a formula for model learning

  • ... other arguments

form

a formula for model learning

fold.alloc.proc

name of fold allocation function. Should be one of

  • Trand_SPrand – each fold contains completely random observations. The default

  • Tall_SPcontig - each fold includes all time and a contiguous block of space

  • Tall_SPrand - each fold includes all time and random locations in space

  • Tall_SPchecker - each fold includes all time and a set of systematically assigned (checkered) part of space

  • Tblock_SPall - each fold includes a block of contiguous time for all locations

  • Trand_SPall - each fold includes random time-snapshots of of all locations

  • Tblock_SPchecker - each fold includes a block of contiguous time for a systematically assigned (checkered) part of space

  • Tblock_SPcontig - each fold includes a block of contiguous time for a block of spatially contiguous locations

  • Tblock_SPrand - each fold includes a block of contiguous time for a randomly assigned part of space

alloc.pars

parameters to pass onto fold.alloc.proc

time

column name of time-stamp in data. Default is "time"

site_id

column name of location identifier in data. Default is "site_id"

.keepTrain

if TRUE (default), instead of the results of FUN being directly returned, a list is created with both the results and a data.frame with the time and site identifiers of the observations used in the training step.

...

other arguments to FUN

Value

If keepTrain is TRUE, a list where each slot corresponds to one repetition or fold, containing a list with slots results containing the results of FUN, and train containing a data.frame with the time and site_id identifiers of the observations used in the training step. Usually, the results of FUN is a data.frame with location identifier site_id, time-stamp time, true values trues and the workflow's predictions preds.


mrfoliveira/Evaluation-procedures-for-forecasting-with-spatio-temporal-data documentation built on April 11, 2021, 10:50 a.m.