This function takes in a factor with class labels of the total dataset,
draws a sample (balanced with respect to the different levels of the factor)
and returns a logical vector indicating whether the observation is in the
learning sample (
TRUE) or not (
factor with the class labels for the total data set
proportion of the data that should be in a training set; the default value is 2/3.
distribution of classes; allows to indicate whether your distribution 'balanced' or 'unbalanced'. The sampling strategy for each run is adapted accordingly.
logical vector indicating for each observation in
the observation is in the learning sample (
TRUE) or not
Willem Talloen and Tobias Verbeke
1 2 3 4 5 6
### this example demonstrates the logic of sampling in case of unbalanced distribution of classes y <- factor(c(rep("A", 21), rep("B", 80))) nlcv:::inTrainingSample(y, 2/3, "unbalanced") table(y[nlcv:::inTrainingSample(y, 2/3, "unbalanced")]) # should be 14, 14 (for A, B resp.) table(y[!nlcv:::inTrainingSample(y, 2/3, "unbalanced")]) # should be 7, 66 (for A, B resp.)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.