inTrainingSample: Function to define a learning sample based on balanced...

Description Usage Arguments Value Author(s) Examples

Description

This function takes in a factor with class labels of the total dataset, draws a sample (balanced with respect to the different levels of the factor) and returns a logical vector indicating whether the observation is in the learning sample (TRUE) or not (FALSE).

Usage

1
2
inTrainingSample(y, propTraining = 2/3, classdist = c("balanced",
  "unbalanced"))

Arguments

y

factor with the class labels for the total data set

propTraining

proportion of the data that should be in a training set; the default value is 2/3.

classdist

distribution of classes; allows to indicate whether your distribution 'balanced' or 'unbalanced'. The sampling strategy for each run is adapted accordingly.

Value

logical vector indicating for each observation in y whether the observation is in the learning sample (TRUE) or not (FALSE)

Author(s)

Willem Talloen and Tobias Verbeke

Examples

1
2
3
4
5
6
  ### this example demonstrates the logic of sampling in case of unbalanced distribution of classes
  y <- factor(c(rep("A", 21), rep("B", 80)))
  
  nlcv:::inTrainingSample(y, 2/3, "unbalanced") 
  table(y[nlcv:::inTrainingSample(y, 2/3, "unbalanced")])  # should be 14, 14 (for A, B resp.)
  table(y[!nlcv:::inTrainingSample(y, 2/3, "unbalanced")]) # should be 7, 66  (for A, B resp.) 

nlcv documentation built on May 2, 2019, 7:28 a.m.