splitDataset: Split a data set for machine learning classification

Description Usage Arguments Value See Also Examples

View source: R/simulation.R

Description

Return data.sets as a list of training set, holdout set and validation set according to the predefined percentage of each partition default is a 50-50 split into training and holdout, no testing set code class/label/phenotypes as 1 and -1. User can manage the simulation data to be dichotomious/quantitative using label (class/qtrait)

Usage

1
2
3
4
5
6
7
splitDataset(
  all.data = NULL,
  pct.train = 0.5,
  pct.holdout = 0.5,
  pct.validation = 0,
  label = "class"
)

Arguments

all.data

A data frame of n rows by d colums of data plus a label column

pct.train

A numeric percentage of samples to use for traning

pct.holdout

A numeric percentage of samples to use for holdout

pct.validation

A numeric percentage of samples to use for testing

label

A character vector of the data column name for the outcome label. class for classification and qtrait for regression.

Value

A list containing:

train

traing data set

holdout

holdout data set

validation

validation data set

See Also

Other simulation: createInteractions(), createMainEffects(), createMixedSimulation(), createSimulation()

Examples

1
2
data("rsfMRIcorrMDD")
data.sets <- splitDataset(rsfMRIcorrMDD)

insilico/privateEC documentation built on May 22, 2020, 5:12 p.m.