mc_cv: Monte Carlo Cross-Validation

Description Usage Arguments Details Value Examples

Description

One resample of Monte Carlo cross-validation takes a random sample (without replacement) of the original data set to be used for analysis. All other data points are added to the assessment set.

Usage

1
mc_cv(data, prop = 3/4, times = 25, strata = NULL, ...)

Arguments

data

A data frame.

prop

The proportion of data to be retained for modeling/analysis.

times

The number of times to repeat the sampling..

strata

A variable that is used to conduct stratified sampling to create the resamples.

...

Not currently used.

Details

The 'strata' argument causes the random sampling to be conducted *within the stratification variable*. The can help ensure that the number of data points in the analysis data is equivalent to the proportions in the original data set.

Value

An tibble with classes 'mc_cv', 'rset', 'tbl_df', 'tbl', and 'data.frame'. The results include a column for the data split objects and a column called 'id' that has a character string with the resample identifier.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
mc_cv(mtcars, times = 2)
mc_cv(mtcars, prop = .5, times = 2)

library(purrr)
iris2 <- iris[1:130, ]

set.seed(13)
resample1 <- mc_cv(iris2, times = 3, prop = .5)
map_dbl(resample1$splits,
        function(x) {
          dat <- as.data.frame(x)$Species
          mean(dat == "virginica")
        })

set.seed(13)
resample2 <- mc_cv(iris2, strata = "Species", times = 3, prop = .5)
map_dbl(resample2$splits,
        function(x) {
          dat <- as.data.frame(x)$Species
          mean(dat == "virginica")
        })

topepo/rsample documentation built on May 4, 2019, 4:25 p.m.