create_data_partition: Data splitting functions for uplift.

View source: R/data_splitting.R

create_data_partitionR Documentation

Data splitting functions for uplift.

Description

create_data_partition creates one or more random data partitions into training and test sets. create_kfolds splits the data into k-folds (or groups) with approximatly the same number of observations. create_bootstrap draws bootstrap samples.

Usage

create_data_partition(y, trt = NULL, p = 0.5, times = 1, groups = 5,
  replace = FALSE)

create_kfolds(y, trt = NULL, k = 10, times = 1, groups = 5)

create_bootstrap(y, times = 10)

Arguments

y

An atomic vector.

trt

An optional treatment variable.

p

The proportion of training observations.

times

The number of partitions to create.

groups

For numeric y, the number of breaks in the quantiles.

replace

Should sampling be done with replacement?

k

The number of folds.

Details

If y is a factor, sampling is done within the levels of y in an attempt to balance the class distributions between the partitions. If y is numeric, groups are first created based on the quantiles of its distribution and then sampling is done within these groups.

If trt is supplied, the data partitions are stratified by the treatment variable.

Notice that in addition to create_bootstrap, bootstrap samples can also by created using create_data_partition with p = 1 and replace = TRUE.

Value

create_data_partition and create_bootstrap return a matrix of row position integers corresponding to the training set and to the bootstrap sample, respectively. create_kfolds returns a matrix with the row integers corresponding to the folds.

Author(s)

Leo Guelman leo.guelman@gmail.com

Examples


set.seed(545)
r <- factor(sample(c(0,1), 1000, replace = TRUE))
t <- factor(sample(c(0,1), 1000, replace = TRUE))
df <- data.frame(r, t)
trainIndex <- create_data_partition(df$r, df$t)
dfTrain <- df[trainIndex, ]
dfTest <- df[-trainIndex, ]
table(df$r, df$t)
table(dfTrain$r, dfTrain$t)
table(dfTest$r, dfTest$t)
# Create k-folds
head(create_kfolds(r, t, times = 5))

# Create 10 bootstrap samples
set.seed(1)
x <- rnorm(100)
xb <- create_bootstrap(x, times = 10)

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.