split_train_and_test: Split train and test data
In ChrisKeefe/UnsupLP1: A sample package exposing some functions for Unspervised Learning

Description Usage Arguments Value Examples

View source: R/split_train_and_test.R

Splits a data matrix into two subsets, returning train and test subsets along with some basic summary statistics. Users may specify preferred sampling proportions, and may provide an integer vector of seed values which will be used in selecting the split that most closely approaches the desired proportions. By default, split_train_and_test splits into 50% TRAIN and 50% TEST.

split_train_and_test(
  matrix,
  obs = NROW(matrix),
  train.prop = 0.5,
  test.prop = 0.5,
  seeds = 1
)

`matrix`	An N x M matrix, containing N rows (observations) and M columns (data features)
`obs`	An integer value specifying how many rows from the original data set should be preserved. (See `size` parameter to sample for details)
`train.prop`	A real number between 0 and 1, describing the preferred proportion of training data
`test.prop`	A real number between 0 and 1, describing the preferred proportion of test data
`seeds`	An integer vector of seed values, for use in randomly sampling observations into train and test subsets

A list containing the following results:

`counts`	a table containing raw counts of train and test observations
`contingency.table`	a contingency table, describing the proportion of observations assigned to train and test sets
`test`	a matrix holding the test data subset, with the same number of columns as the input matrix, and one row per sampled observation
`train`	a matrix holding the train data subset, with the same number of columns as the input matrix, and one row per sampled observation

# By default, split_train_and_test attempts to split into 50% TRAIN and 50% TEST, and iterates once.
default <- split_train_and_test(iris)
default$contingency.table

# Increasing the number of seeds increases the likelihood of an optimal split
many.seeds <- split_train_and_test(iris, seeds=1:113)
many.seeds$contingency.table

# Pass proportions if you prefer a different split
eighty.twenty <- split_train_and_test(iris, train.prop=0.8, test.prop=0.2)
eighty.twenty$contingency.table

# Use `obs` to subset your data while sampling
subset <- split_train_and_test(iris, obs=100)  ## produces ~50 TRAIN and ~50 TEST observations
subset$counts

ChrisKeefe/UnsupLP1 documentation built on Oct. 8, 2020, 5:37 a.m.

ChrisKeefe/UnsupLP1 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ChrisKeefe/UnsupLP1
A sample package exposing some functions for Unspervised Learning

split_train_and_test: Split train and test data
In ChrisKeefe/UnsupLP1: A sample package exposing some functions for Unspervised Learning

Description

Usage

Arguments

Value

Examples

Related to split_train_and_test in ChrisKeefe/UnsupLP1...

R Package Documentation

Browse R Packages

We want your feedback!

ChrisKeefe/UnsupLP1 A sample package exposing some functions for Unspervised Learning

split_train_and_test: Split train and test data In ChrisKeefe/UnsupLP1: A sample package exposing some functions for Unspervised Learning

Description

Usage

Arguments

Value

Examples

Related to split_train_and_test in ChrisKeefe/UnsupLP1...

R Package Documentation

Browse R Packages

We want your feedback!

ChrisKeefe/UnsupLP1
A sample package exposing some functions for Unspervised Learning

split_train_and_test: Split train and test data
In ChrisKeefe/UnsupLP1: A sample package exposing some functions for Unspervised Learning