runTest: Perform a Single Classification

Description Usage Arguments Details Value Author(s) Examples

Description

For a data set of features and samples, the classification process is run. It consists of data transformation, feature selection, classifier training and testing.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  ## S4 method for signature 'matrix'
runTest(measurements, classes, ...)
  ## S4 method for signature 'DataFrame'
runTest(measurements, classes, featureSets = NULL, metaFeatures = NULL,
            minimumOverlapPercent = 80, datasetName, classificationName,
            training, testing, params = list(SelectParams(), TrainParams(), PredictParams()),
        verbose = 1, .iteration = NULL)
  ## S4 method for signature 'MultiAssayExperiment'
runTest(measurements, targets = names(measurements), ...)
  ## S4 method for signature 'MultiAssayExperiment'
runTestEasyHard(measurements, easyDatasetID = "clinical", hardDatasetID = names(measurements)[1],
                   featureSets = NULL, metaFeatures = NULL, minimumOverlapPercent = 80,
                   datasetName = NULL, classificationName = "Easy-Hard Classifier", training, testing, ..., verbose = 1, .iteration = NULL)

Arguments

measurements

Either a matrix, DataFrame or MultiAssayExperiment containing the training data. For a matrix, the rows are features, and the columns are samples. The sample identifiers must be present as column names of the matrix or the row names of the DataFrame.

classes

Either a vector of class labels of class factor of the same length as the number of samples in measurements or if the measurements are of class DataFrame a character vector of length 1 containing the column name in measurement is also permitted. Not used if measurements is a MultiAssayExperiment object.

featureSets

An object of type FeatureSetCollection which defines sets of features or sets of edges.

metaFeatures

Either NULL or a DataFrame which has meta-features of the numeric data of interest.

minimumOverlapPercent

If featureSets stores sets of features, the minimum overlap of feature IDs with measurements for a feature set to be retained in the analysis. If featureSets stores sets of network edges, the minimum percentage of edges with both vertex IDs found in measurements that a set has to have to be retained in the analysis.

targets

If measurements is a MultiAssayExperiment, the names of the data tables to be used. "clinical" is also a valid value and specifies that numeric variables from the clinical data table will be used.

...

For runTest, variables not used by the matrix nor the MultiAssayExperiment method which are passed into and used by the DataFrame method. For runTestEasyHard, easyClassifierParams and hardClassifierParams to be passed to easyHardClassifierTrain.

datasetName

A name associated with the data set used.

classificationName

A name associated with the classification.

training

A vector which specifies the training samples.

testing

A vector which specifies the test samples.

params

A list of objects of class of TransformParams, SelectParams, TrainParams, or PredictParams. The order they are in the list determines the order in which the stages of classification are done in.

easyDatasetID

The name of a data set in measurements or "clinical" to indicate the patient information in the column data be used.

hardDatasetID

The name of a data set in measurements different to the value of easyDatasetID to be used for classifying the samples not classified by the easy classifier.

verbose

Default: 1. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages as more lower-level functions print messages.

.iteration

Not to be set by a user. This value is used to keep track of the cross-validation iteration, if called by runTests.

Details

This function only performs one classification and prediction. See runTests for a driver function that enables a number of different cross-validation schemes to be applied and uses this function to perform each iteration. datasetName and classificationName need to be provided.

Value

If called directly by the user rather than being used internally by runTests, a SelectResult object.

Author(s)

Dario Strbenac

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
  #if(require(sparsediscrim))
  #{
    data(asthma)
    resubstituteParams <- ResubstituteParams(nFeatures = seq(5, 25, 5),
                                         performanceType = "balanced error",
                                         better = "lower")
    runTest(measurements, classes,
            datasetName = "Asthma", classificationName = "Different Means",
            params = list(SelectParams(limmaSelection, "Moderated t Statistic",
                                       resubstituteParams = resubstituteParams),
                          TrainParams(DLDAtrainInterface),
                          PredictParams(DLDApredictInterface)
                          ),
            training = (1:ncol(measurements)) %% 2 == 0,
            testing = (1:ncol(measurements)) %% 2 != 0)
  #}
  
  genesMatrix <- matrix(c(rnorm(90, 9, 1),
                        9.5, 9.4, 5.2, 5.3, 5.4, 9.4, 9.6, 9.9, 9.1, 9.8),
		      ncol = 10, byrow = TRUE)

  colnames(genesMatrix) <- paste("Sample", 1:10)
  rownames(genesMatrix) <- paste("Gene", 1:10)
  genders <- factor(c("Male", "Male", "Female", "Female", "Female",
                    "Female", "Female", "Female", "Female", "Female"))

  # Scenario: Male gender can predict the hard-to-classify Sample 1 and Sample 2.
  clinical <- DataFrame(age = c(31, 34, 32, 39, 33, 38, 34, 37, 35, 36),
                        gender = genders,
                        class = factor(rep(c("Poor", "Good"), each = 5)),
		        row.names = colnames(genesMatrix))
  dataset <- MultiAssayExperiment(ExperimentList(RNA = genesMatrix), clinical)
                                    
  selParams <- SelectParams(featureSelection = differentMeansSelection, selectionName = "Difference in Means",
                            resubstituteParams = ResubstituteParams(1:10, "balanced error", "lower"))
  easyHardCV <- runTestEasyHard(dataset, datasetName = "Test Data", classificationName = "Easy-Hard", training = 1:10, testing = 1:10,
                                easyClassifierParams = list(minCardinality = 2, minPurity = 0.9),
                                hardClassifierParams = list(selParams, TrainParams(), PredictParams())
                               )

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.