easyHardClassifier: Two-stage Classification Using Easy-to-collect Data Set and...
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description Usage Arguments Details Value Author(s) References Examples

An alternative implementation to the previously published easy-hard classifier that doesn't do nested cross-validation for speed. In the first stage, each numeric variable is split on all possible midpoints between consecutive ordered values and the samples below the split and above the split are checked to see if they mostly belong to one class. Categorical varaibles are tabulated on factor levels and the count of samples in each class is determined. If any partitions of samples are pure for a class, based on a purity threshold, prediction rules are created. The samples not classified by any rule or classified to two or more classes the same number of times are left to be trained by the hard classifier.

  ## S4 method for signature 'MultiAssayExperiment'
easyHardClassifierTrain(measurements, easyDatasetID = "clinical", hardDatasetID = names(measurements)[1],
         featureSets = NULL, metaFeatures = NULL, minimumOverlapPercent = 80,
         datasetName = NULL, classificationName = "Easy-Hard Classifier",
         easyClassifierParams = list(minCardinality = 10, minPurity = 0.9),
         hardClassifierParams = list(SelectParams(), TrainParams(), PredictParams()), 
         verbose = 3)
  ## S4 method for signature 'EasyHardClassifier,MultiAssayExperiment'
easyHardClassifierPredict(model, test, predictParams, verbose = 3)

measurements

A MultiAssayExperiment object containing the data set The sample classes must be in a column of the DataFrame accessed by colData named "class"

`easyDatasetID`	The name of a data set in `measurements` or "clinical" to indicate the patient information in the column data be used.
`hardDatasetID`	The name of a data set in `measurements` different to the value of `easyDatasetID` to be used for classifying the samples not classified by the easy classifier.
`featureSets`	An object of type `FeatureSetCollection` which defines sets of features or sets of edges.
`metaFeatures`	Either `NULL` or a `DataFrame` which has meta-features of the numeric data of interest.
`minimumOverlapPercent`	If `featureSets` stores sets of features, the minimum overlap of feature IDs with `measurements` for a feature set to be retained in the analysis. If `featureSets` stores sets of network edges, the minimum percentage of edges with both vertex IDs found in `measurements` that a set has to have to be retained in the analysis.
`datasetName`	A name associated with the data set used.
`classificationName`	A name associated with the classification.
`easyClassifierParams`	A list of length 2 with names "minCardinality" and "minPurity". The first parameter specifies what the minimum number of samples after a split has to be and the second specifies the minimum proportion of samples in a partition belonging to a particular class.
`hardClassifierParams`	A list of objects defining the classification to do on the samples which were not predicted by the easy classifier Objects must of of class `TransformParams`, `SelectParams`, `TrainParams` or `PredictParams`.
`model`	A trained `EasyHardClassifier` object.
`test`	A `MultiAssayExperiment` object containing the test data.
`predictParams`	An object of class `PredictParams`. It specifies the classifier used to make the hard predictions.
`verbose`	Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

The easy classifier may be NULL if there are no rules that predicted the sample well using the easy data set. The hard classifier may be NULL if all of the samples could be predicted with rules generated using the easy data set or it will simply be a character if all or almost all of the remaining samples belong to one class.

For EasyHardClassifierTrain, the trained two-stage classifier. For EasyHardClassifierPredict, a factor vector of predicted classes.

Dario Strbenac

Inspired by: Stepwise Classification of Cancer Samples Using Clinical and Molecular Data, Askar Obulkasim, Gerrit Meijer and Mark van de Wiel 2011, BMC Bioinformatics, Volume 12 article 422, https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-422.

  genesMatrix <- matrix(c(rnorm(90, 9, 1),
                          9.5, 9.4, 5.2, 5.3, 5.4, 9.4, 9.6, 9.9, 9.1, 9.8),
		        ncol = 10, byrow = TRUE)
  colnames(genesMatrix) <- paste("Sample", 1:10)
  rownames(genesMatrix) <- paste("Gene", 1:10)
  genders <- factor(c("Male", "Male", "Female", "Female", "Female",
                      "Female", "Female", "Female", "Female", "Female"))

  # Scenario: Male gender can predict the hard-to-classify Sample 1 and Sample 2.
  clinical <- DataFrame(age = c(31, 34, 32, 39, 33, 38, 34, 37, 35, 36),
                        gender = genders,
                        class = factor(rep(c("Poor", "Good"), each = 5)),
		        row.names = colnames(genesMatrix))
  dataset <- MultiAssayExperiment(ExperimentList(RNA = genesMatrix), clinical)
  selParams <- SelectParams(featureSelection = differentMeansSelection, selectionName = "Difference in Means",
                            resubstituteParams = ResubstituteParams(1:10, "balanced error", "lower"))
  trained <- easyHardClassifierTrain(dataset, easyClassifierParams = list(minCardinality = 2, minPurity = 0.9),
                                     hardClassifierParams = list(selParams, TrainParams(), PredictParams()))

  predictions <- easyHardClassifierPredict(trained, dataset, PredictParams())

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.

ClassifyR index

An Introduction to **ClassifyR** Example: Creating a Wrapper Function for *k* Nearest Neighbours Classification

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

easyHardClassifier: Two-stage Classification Using Easy-to-collect Data Set and...
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to easyHardClassifier in ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

ClassifyR A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

easyHardClassifier: Two-stage Classification Using Easy-to-collect Data Set and... In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to easyHardClassifier in ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

easyHardClassifier: Two-stage Classification Using Easy-to-collect Data Set and...
In ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing