partition_data: Helper function that partitions a data set into training and...

Description Usage Arguments Details Value Examples

Description

The function randomly partitions a data set into training and test data sets with a specified percentage of observations assigned to the training data set. The user can optionally preserve the proportions of the original data set.

Usage

1
2
  partition_data(x, y, split_pct = 2/3,
    preserve_proportions = FALSE)

Arguments

x

a matrix of n observations (rows) and p features (columns)

y

a vector of n class labels

split_pct

the percentage of observations that will be randomly assigned to the training data set. The remainder of the observations will be assigned to the test data set.

preserve_proportions

logical value. If TRUE, the training and test data sets will be constructed so that the original proportions are preserved.

Details

A named list is returned with the training and test data sets.

Value

named list containing the training and test data sets:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
require('MASS')
x <- iris[, -5]
y <- iris[, 5]
set.seed(42)
data <- partition_data(x = x, y = y)
table(data$train_y)
table(data$test_y)

data <- partition_data(x = x, y = y, preserve_proportions = TRUE)
table(data$train_y)
table(data$test_y)

ramhiser/sortinghat documentation built on May 26, 2019, 10:12 p.m.