runTests: Reproducibly Run Various Kinds of Cross-Validation
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description Usage Arguments Value Author(s) Examples

Enables doing classification schemes such as ordinary 10-fold, 100 permutations 5-fold, and leave one out cross-validation. Processing in parallel is possible by leveraging the package BiocParallel.

  ## S4 method for signature 'matrix'
runTests(measurements, classes, ...)
  ## S4 method for signature 'DataFrame'
runTests(measurements, classes, featureSets = NULL, metaFeatures = NULL,
         minimumOverlapPercent = 80, datasetName, classificationName,
         validation = c("permute", "leaveOut", "fold"),
         permutePartition = c("fold", "split"),
         permutations = 100, percent = 25, folds = 5, leave = 2,
         seed, parallelParams = bpparam(),
            params = list(SelectParams(), TrainParams(), PredictParams()), verbose = 1)
  ## S4 method for signature 'MultiAssayExperiment'
runTests(measurements, targets = names(measurements), ...)
  ## S4 method for signature 'MultiAssayExperiment'
runTestsEasyHard(measurements, easyDatasetID = "clinical", hardDatasetID = names(measurements)[1],
                   featureSets = NULL, metaFeatures = NULL, minimumOverlapPercent = 80,
                   datasetName = NULL, classificationName = "Easy-Hard Classifier", 
                   validation = c("permute", "leaveOut", "fold"),
                   permutePartition = c("fold", "split"),
                   permutations = 100, percent = 25, folds = 5, leave = 2,
                   seed, parallelParams = bpparam(), ..., verbose = 1)

`measurements`	Either a `matrix`, `DataFrame` or `MultiAssayExperiment` containing the training data. For a `matrix`, the rows are features, and the columns are samples. The sample identifiers must be present as column names of the `matrix` or the row names of the `DataFrame`.
`classes`	Either a vector of class labels of class `factor` of the same length as the number of samples in `measurements` or if the measurements are of class `DataFrame` a character vector of length 1 containing the column name in `measurement` is also permitted. Not used if `measurements` is a `MultiAssayExperiment` object.
`featureSets`	An object of type `FeatureSetCollection` which defines sets of features or sets of edges.
`metaFeatures`	Either `NULL` or a `DataFrame` which has meta-features of the numeric data of interest.
`minimumOverlapPercent`	If `featureSets` stores sets of features, the minimum overlap of feature IDs with `measurements` for a feature set to be retained in the analysis. If `featureSets` stores sets of network edges, the minimum percentage of edges with both vertex IDs found in `measurements` that a set has to have to be retained in the analysis.
`targets`	If `measurements` is a `MultiAssayExperiment`, the names of the data tables to be used. `"clinical"` is also a valid value and specifies that numeric variables from the clinical data table will be used.
`...`	For `runTests`, variables not used by the `matrix` nor the `MultiAssayExperiment` method which are passed into and used by the `DataFrame` method. For `runTestsEasyHard`, `easyClassifierParams` and `hardClassifierParams` to be passed to `easyHardClassifierTrain`
`datasetName`	A name associated with the data set used.
`classificationName`	A name associated with the classification.
`validation`	Default: `"permute"`. `"permute"` for repeated permuting. `"leaveOut"` for leaving all possible combinations of k samples as test samples. `"fold"` for folding of the data set (no resampling).
`permutePartition`	Default: `"fold"`. Either `"fold"` or `"split"`. Only applicable if `validation` is `"permute"`. If `"fold"`, then the samples are split into folds and in each iteration one is used as the test set. If `"split"`, the samples are split into two groups, the sizes being based on the `percent` value. One group is used as the training set, the other is the test set.
`permutations`	Default: 100. Relevant when permuting is used. The number of times to do reordering of the samples before splitting or folding them.
`percent`	Default: 25. Used when permutation with the split method is chosen. The percentage of samples to be in the test set.
`folds`	Default: 5. Relevant when repeated permutations are done and `permutePartition` is set to `"fold"` or when `validation` is set to `"fold"`. The number of folds to break the data set into. Each fold is used once as the test set.
`leave`	Default: 2. Relevant when leave-k-out cross-validation is used. The number of samples to leave for testing.
`seed`	The random number generator used for repeated resampling will use this seed, if it is provided. Allows reproducibility of repeated usage on the same input data.
`parallelParams`	An object of class `MulticoreParam` or `SnowParam`.
`params`	A `list` of objects of class of `TransformParams`, `SelectParams`, `TrainParams` or `PredictParams`. The order they are in the list determines the order in which the stages of classification are done in.
`easyDatasetID`	The name of a data set in `measurements` or "clinical" to indicate the patient information in the column data be used.
`hardDatasetID`	The name of a data set in `measurements` different to the value of `easyDatasetID` to be used for classifying the samples not classified by the easy classifier.
`verbose`	Default: 1. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages as more lower-level functions print messages.

If the predictor function made a single prediction, then an object of class ClassifyResult. If the predictor function made a set of predictions, then a list of such objects.

Dario Strbenac

  #if(require(sparsediscrim))
  #{
    data(asthma)
    
    resubstituteParams <- ResubstituteParams(nFeatures = seq(5, 25, 5),
                                         performanceType = "balanced error",
                                         better = "lower")
    runTests(measurements, classes, datasetName = "Asthma",
             classificationName = "Different Means", permutations = 5,
             params = list(SelectParams(differentMeansSelection, "t Statistic",
                                        resubstituteParams = resubstituteParams),
                           TrainParams(DLDAtrainInterface),
                           PredictParams(DLDApredictInterface)
                           )
             )
  #}
  
  genesMatrix <- matrix(c(rnorm(90, 9, 1),
                        9.5, 9.4, 5.2, 5.3, 5.4, 9.4, 9.6, 9.9, 9.1, 9.8),
		      ncol = 10, byrow = TRUE)

  colnames(genesMatrix) <- paste("Sample", 1:10)
  rownames(genesMatrix) <- paste("Gene", 1:10)
  genders <- factor(c("Male", "Male", "Female", "Female", "Female",
                    "Female", "Female", "Female", "Female", "Female"))

  # Scenario: Male gender can predict the hard-to-classify Sample 1 and Sample 2.
  clinical <- DataFrame(age = c(31, 34, 32, 39, 33, 38, 34, 37, 35, 36),
                        gender = genders,
                        class = factor(rep(c("Poor", "Good"), each = 5)),
		        row.names = colnames(genesMatrix))
  dataset <- MultiAssayExperiment(ExperimentList(RNA = genesMatrix), clinical)
  selParams <- SelectParams(featureSelection = differentMeansSelection, selectionName = "Difference in Means",
                            resubstituteParams = ResubstituteParams(1:10, "balanced error", "lower"))
  easyHardCV <- runTestsEasyHard(dataset, datasetName = "Test Data", classificationName = "Easy-Hard",
                                 easyClassifierParams = list(minCardinality = 2, minPurity = 0.9),
                                 hardClassifierParams = list(selParams, TrainParams(), PredictParams()),
                                 validation = "leaveOut", leave = 1)

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.

ClassifyR index

An Introduction to **ClassifyR** Example: Creating a Wrapper Function for *k* Nearest Neighbours Classification

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

runTests: Reproducibly Run Various Kinds of Cross-Validation
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description

Usage

Arguments

Value

Author(s)

Examples

Related to runTests in ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

ClassifyR A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

runTests: Reproducibly Run Various Kinds of Cross-Validation In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description

Usage

Arguments

Value

Author(s)

Examples

Related to runTests in ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

runTests: Reproducibly Run Various Kinds of Cross-Validation
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing