kf_xval: Cross-validation

Description Usage Arguments Value

View source: R/eval_framework.R

Description

Performs a cross-validation experiment where folds can be allocated in different ways considering time and/or space Performs a cross-validation experiment where folds can be allocated in different ways considering time and/or space

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
kf_xval(
  data,
  nfolds,
  FUN,
  form,
  fold.alloc.proc = "Trand_SPrand",
  alloc.pars = NULL,
  time = "time",
  site_id = "site",
  .keepTrain = TRUE,
  .parallel = TRUE,
  .verbose = ifelse(.parallel, FALSE, TRUE),
  ...
)

Arguments

data

full dataset

nfolds

number of folds for the data set to be separated into.
If you would like to set the number of time and space folds separately, nfolds should be set to NULL and t.nfolds and sp.nfolds should be fed as a list to alloc.pars (only available when using fold.alloc.proc set to Tblock_SPchecker, Tblock_SPcontig or Tblock_SPrand).

FUN

function with arguments

  • train training set

  • test testing set

  • time column name of time-stamps

  • site_id column name of location identifiers

  • form a formula for model learning

  • ... other arguments

form

a formula for model learning

fold.alloc.proc

name of fold allocation function. Should be one of

  • Trand_SPrand – each fold contains completely random observations. The default

  • Tall_SPcontig - each fold includes all time and a contiguous block of space

  • Tall_SPrand - each fold includes all time and random locations in space

  • Tall_SPchecker - each fold includes all time and a set of systematically assigned (checkered) part of space

  • Tblock_SPall - each fold includes a block of contiguous time for all locations

  • Trand_SPall - each fold includes random time-snapshots of of all locations

  • Tblock_SPchecker - each fold includes a block of contiguous time for a systematically assigned (checkered) part of space

  • Tblock_SPcontig - each fold includes a block of contiguous time for a block of spatially contiguous locations

  • Tblock_SPrand - each fold includes a block of contiguous time for a randomly assigned part of space

alloc.pars

parameters to pass onto fold.alloc.proc

time

column name of time-stamp in data.

site_id

column name of location identifier in data.

.keepTrain

if TRUE (default), instead of the results of FUN being directly returned, a list is created with both the results and a data.frame with the time and site identifiers of the observations used in the training step.

.parallel

Boolean indicating whether each fold should be run in parallel

.verbose

Boolean indicating whether updates on progress should be printed

...

other arguments to FUN

Value

The results of FUN. Usually, a data.frame with location identifier site_id, time-stamp time, true values trues and the workflow's predictions preds.


mrfoliveira/STResampling-JDSA2020 documentation built on June 28, 2021, 7:01 p.m.