trainSubset: Create Training-Test Split
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description Usage Arguments Value References Examples

This uses the maximum dissimilarity method for creating a training-test split. This is better than just using a random subset for the training data. By maximizing the dissimilarity of the rows of the data frame the variability of the data set is preserved. This means the training data will be legitimately representative of the whole dataset and obviates any concerns about the impact of the training-test split on the final inferences. This function is nearly deterministic in regards to which observations are chosen which also facilitates reproducibility.

1	trainSubset(data, p, y = NULL)

`data`	a data frame of the full data set.
`p`	the target proportion of the data set you wish to use for the training set. the size of the subset is rounded to the nearest integer. setting p = 0.80 with a data frame of 233 rows will result in around 186 observations in the training data, for example. The final number may be slightly less than p*n due to rounding.
`y`	an optional character string indicating the column name of the intended response variable. if supplied this chooses observations of the response variable near the median as the seed in order to faciliate unbiasedness in sampling values only near one of the upper or lower quantiles.

a vector of integers corresponding to the rows chosen for the training data.

Willett, P. 1999. "Dissimilarity-Based Algorithms for Selecting Structurally Diverse Sets of Compounds," Journal of Computational Biology, 6, 447-457.

1
2
3

idx <- train.subset(data = mydata, y = "weight", p = 0.60)
training <- mydata[idx, ]
testing <- mydata[-idx, ]

abnormally-distributed/cvreg documentation built on May 3, 2020, 3:45 p.m.

abnormally-distributed/cvreg index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

abnormally-distributed/cvreg
Cross Validation and Robust Estimation Utilities

trainSubset: Create Training-Test Split
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description

Usage

Arguments

Value

References

Examples

Related to trainSubset in abnormally-distributed/cvreg...

R Package Documentation

Browse R Packages

We want your feedback!

abnormally-distributed/cvreg Cross Validation and Robust Estimation Utilities

trainSubset: Create Training-Test Split In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description

Usage

Arguments

Value

References

Examples

Related to trainSubset in abnormally-distributed/cvreg...

R Package Documentation

Browse R Packages

We want your feedback!

abnormally-distributed/cvreg
Cross Validation and Robust Estimation Utilities

trainSubset: Create Training-Test Split
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities