runTests: Reproducibly Run Various Kinds of Cross-Validation

Description Usage Arguments Value Author(s) Examples

Description

Enables doing classification schemes such as ordinary 10-fold, 100 permutations 5-fold, and leave one out cross-validation. Processing in parallel is possible by leveraging the package BiocParallel.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  ## S4 method for signature 'matrix'
runTests(measurements, classes, ...)
  ## S4 method for signature 'DataFrame'
runTests(measurements, classes, datasetName, classificationName,
         validation = c("permute", "leaveOut", "fold"),
         permutePartition = c("fold", "split"),
         permutations = 100, percent = 25, folds = 5, leave = 2,
         seed, parallelParams = bpparam(),
            params = list(SelectParams(), TrainParams(), PredictParams()), verbose = 1)
  ## S4 method for signature 'MultiAssayExperiment'
runTests(measurements, targets = names(measurements), ...)

Arguments

measurements

Either a matrix, DataFrame or MultiAssayExperiment containing the training data. For a matrix, the rows are features, and the columns are samples. The sample identifiers must be present as column names of the matrix or the row names of the DataFrame.

classes

Either a vector of class labels of class factor of the same length as the number of samples in measurements or if the measurements are of class DataFrame a character vector of length 1 containing the column name in measurement is also permitted. Not used if measurements is a MultiAssayExperiment object.

targets

If measurements is a MultiAssayExperiment, the names of the data tables to be used. "clinical" is also a valid value and specifies that numeric variables from the clinical data table will be used.

...

Variables not used by the matrix nor the MultiAssayExperiment method which are passed into and used by the DataFrame method.

datasetName

A name associated with the data set used.

classificationName

A name associated with the classification.

validation

Default: "permute". "permute" for repeated permuting. "leaveOut" for leaving all possible combinations of k samples as test samples. "fold" for folding of the data set (no resampling).

permutePartition

Default: "fold". Either "fold" or "split". Only applicable if validation is "permute". If "fold", then the samples are split into folds and in each iteration one is used as the test set. If "split", the samples are split into two groups, the sizes being based on the percent value. One group is used as the training set, the other is the test set.

permutations

Default: 100. Relevant when permuting is used. The number of times to do reordering of the samples before splitting or folding them.

percent

Default: 25. Used when permutation with the split method is chosen. The percentage of samples to be in the test set.

folds

Default: 5. Relevant when repeated permutations are done and permutePartition is set to "fold" or when validation is set to "fold". The number of folds to break the data set into. Each fold is used once as the test set.

leave

Default: 2. Relevant when leave-k-out cross-validation is used. The number of samples to leave for testing.

seed

The random number generator used for repeated resampling will use this seed, if it is provided. Allows reproducibility of repeated usage on the same input data.

parallelParams

An object of class MulticoreParam or SnowParam.

params

A list of objects of class of TransformParams, SelectParams, TrainParams or PredictParams. The order they are in the list determines the order in which the stages of classification are done in.

verbose

Default: 1. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages as more lower-level functions print messages.

Value

If the predictor function made a single prediction, then an object of class ClassifyResult. If the predictor function made a set of predictions, then a list of such objects.

Author(s)

Dario Strbenac

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  if(require(sparsediscrim))
  {
    data(asthma)
    
    resubstituteParams <- ResubstituteParams(nFeatures = seq(5, 25, 5),
                                         performanceType = "balanced error",
                                         better = "lower")
    runTests(measurements, classes, "Asthma", "Different Means",
             permutations = 5,
             params = list(SelectParams(limmaSelection, "Moderated t Statistic",
                                        resubstituteParams = resubstituteParams),
                           TrainParams(DLDAtrainInterface),
                           PredictParams(DLDApredictInterface,
                                         getClasses = function(result) result[["class"]])))
  }

ClassifyR documentation built on July 8, 2018, 2 a.m.