precisionPathways: Precision Pathways for Sample Prediction Based on Prediction...

precisionPathwaysTrainR Documentation

Precision Pathways for Sample Prediction Based on Prediction Confidence.

Description

Precision pathways allows the evaluation of various permutations of multiomics or multiview data. Samples are predicted by a particular assay if they were consistently predicted as a particular class during cross-validation. Otherwise, they are passed onto subsequent assays/tiers for prediction. Balanced accuracy is used to evaluate overall prediction performance and sample-specific accuracy for individual-level evaluation.

Usage

## S4 method for signature 'MultiAssayExperimentOrList'
precisionPathwaysTrain(
  measurements,
  class,
  useFeatures = NULL,
  maxMissingProp = 0,
  topNvariance = NULL,
  fixedAssays = "clinical",
  confidenceCutoff = 0.8,
  minAssaySamples = 10,
  nFeatures = 20,
  selectionMethod = setNames(c("none", rep("t-test", length(measurements))),
    c("clinical", names(measurements))),
  classifier = setNames(c("elasticNetGLM", rep("randomForest", length(measurements))),
    c("clinical", names(measurements))),
  nFolds = 5,
  nRepeats = 20,
  nCores = 1
)

## S4 method for signature 'PrecisionPathways,MultiAssayExperimentOrList'
precisionPathwaysPredict(pathways, measurements, class)

Arguments

measurements

Either a MultiAssayExperiment or a list of the basic tabular objects containing the data.

class

If a MultiAssayExperiment, a column name in colData(measurements) with the classes. If measurements is a list of tabular data, may also be a vector of classes.

useFeatures

Default: NULL (i.e. use all provided features). A named list of features to use. Otherwise, the input data is a single table and this can just be a vector of feature names. For any assays not in the named list, all of their features are used. "clinical" is also a valid assay name and refers to the clinical data table. This allows for the avoidance of variables such spike-in RNAs, sample IDs, sample acquisition dates, etc. which are not relevant for outcome prediction.

maxMissingProp

Default: 0.0. A proportion less than 1 which is the maximum tolerated proportion of missingness for a feature to be retained for modelling.

topNvariance

Default: NULL. An integer number of most variable features per assay to subset to. Assays with less features won't be reduced in size.

fixedAssays

A character vector of assay names specifying any assays which must be at the beginning of the pathway.

confidenceCutoff

The minimum confidence of predictions for a sample to be predicted by a particular issue . If a sample was predicted to belong to a particular class a proportion p times, then the confidence is 2 \times |p - 0.5|.

minAssaySamples

An integer specifying the minimum number of samples a tier may have. If a subsequent tier would have less than this number of samples, the samples are incorporated into the current tier.

nFeatures

Default: 20. The number of features to consider during feature selection, if feature selection is done.

selectionMethod

A named character vector of feature selection methods to use for the assays, one for each. The names must correspond to names of measurements.

classifier

A named character vector of modelling methods to use for the assays, one for each. The names must correspond to names of measurements.

nFolds

A numeric specifying the number of folds to use for cross-validation.

nRepeats

A numeric specifying the the number of repeats or permutations to use for cross-validation.

nCores

A numeric specifying the number of cores used if the user wants to use parallelisation.

pathways

A set of pathways created by precisionPathwaysTrain which is an object of class PrecisionPathways to be used for predicting on a new data set.

Value

An object of class PrecisionPathways which is basically a named list that other plotting and tabulating functions can use.

Examples

# To be determined.

DarioS/ClassifyR documentation built on Dec. 19, 2024, 8:22 p.m.