partition_cv_strat: Partition the data for a stratified (non-spatial)...

Description Usage Arguments Value See Also Examples

View source: R/sperrorest_resampling.R

Description

partition_cv_strat creates a set of sample indices corresponding to cross-validation test and training sets.

Usage

1
2
partition_cv_strat(data, coords = c("x", "y"), nfold = 10,
  return_factor = FALSE, repetition = 1, seed1 = NULL, strat)

Arguments

data

data.frame containing at least the columns specified by coords

coords

vector of length 2 defining the variables in data that contain the x and y coordinates of sample locations

nfold

number of partitions (folds) in nfold-fold cross-validation partitioning

return_factor

if FALSE (default), return a represampling object; if TRUE (used internally by other sperrorest functions), return a list containing factor vectors (see Value)

repetition

numeric vector: cross-validation repetitions to be generated. Note that this is not the number of repetitions, but the indices of these repetitions. E.g., use repetition = c(1:100) to obtain (the 'first') 100 repetitions, and repetition = c(101:200) to obtain a different set of 100 repetitions.

seed1

seed1+i is the random seed that will be used by set.seed in repetition i (i in repetition) to initialize the random number generator before sampling from the data set.

strat

character: column in data containing a factor variable over which the partitioning should be stratified; or factor vector of length nrow(data): variable over which to stratify

Value

A represampling object, see also partition_cv. partition_strat_cv, however, stratified with respect to the variable data[,strat]; i.e., cross-validation partitioning is done within each set data[data[,strat]==i,] (i in levels(data[, strat])), and the ith folds of all levels are combined into one cross-validation fold.

See Also

sperrorest, as.resampling, resample_strat_uniform

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(ecuador)
parti <- partition_cv_strat(ecuador, strat = 'slides', nfold = 5,
repetition = 1)
idx <- parti[['1']][[1]]$train
mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')
# always == 1
# Non-stratified cross-validation:
parti <- partition_cv(ecuador, nfold = 5, repetition = 1)
idx <- parti[['1']][[1]]$train
mean(ecuador$slides[idx] == 'TRUE') / mean(ecuador$slides == 'TRUE')
# close to 1 because of large sample size, but with some random variation

sperrorest documentation built on April 1, 2018, 12:27 p.m.