Description Usage Arguments Value Examples
View source: R/split_train_and_test.R
Splits a data matrix into two subsets, returning train and test subsets along with some basic summary statistics. Users may specify preferred sampling proportions, and may provide an integer vector of seed values which will be used in selecting the split that most closely approaches the desired proportions. By default, split_train_and_test splits into 50% TRAIN and 50% TEST.
1 2 3 4 5 6 7 | split_train_and_test(
matrix,
obs = NROW(matrix),
train.prop = 0.5,
test.prop = 0.5,
seeds = 1
)
|
matrix |
An N x M matrix, containing N rows (observations) and M columns (data features) |
obs |
An integer value specifying how many rows from the original data set should be preserved. (See |
train.prop |
A real number between 0 and 1, describing the preferred proportion of training data |
test.prop |
A real number between 0 and 1, describing the preferred proportion of test data |
seeds |
An integer vector of seed values, for use in randomly sampling observations into train and test subsets |
A list containing the following results:
|
a table containing raw counts of train and test observations |
|
a contingency table, describing the proportion of observations assigned to train and test sets |
|
a matrix holding the test data subset, with the same number of columns as the input matrix, and one row per sampled observation |
|
a matrix holding the train data subset, with the same number of columns as the input matrix, and one row per sampled observation |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # By default, split_train_and_test attempts to split into 50% TRAIN and 50% TEST, and iterates once.
default <- split_train_and_test(iris)
default$contingency.table
# Increasing the number of seeds increases the likelihood of an optimal split
many.seeds <- split_train_and_test(iris, seeds=1:113)
many.seeds$contingency.table
# Pass proportions if you prefer a different split
eighty.twenty <- split_train_and_test(iris, train.prop=0.8, test.prop=0.2)
eighty.twenty$contingency.table
# Use `obs` to subset your data while sampling
subset <- split_train_and_test(iris, obs=100) ## produces ~50 TRAIN and ~50 TEST observations
subset$counts
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.