messina: Find optimal single feature classifiers

Description Usage Arguments Details Value Threshold classifiers Author(s) References See Also Examples

View source: R/messina.R

Description

Run the Messina algorithm to find features (eg. genes) that optimally distinguish between two classes of samples, subject to minimum performance requirements.

Usage

1
2
messina(x, y, min_sens, min_spec, f_train = 0.9, n_boot = 50, seed = NULL,
  progress = TRUE, silent = FALSE)

Arguments

x

feature expression values, either supplied as an ExpressionSet, or as an object that can be converted to a matrix by as.matrix. In the latter case, features should be in rows and samples in columns, with feature names taken from the rows of the object.

y

a binary vector (TRUE/FALSE or 1/0) of class membership information for each sample in x.

min_sens

the minimum acceptable sensitivity that a classifier separating the two groups of y must achieve.

min_spec

the minimum acceptable specificity that a classifier separating the two groups of y must achieve.

f_train

the fraction of samples to be used in the training splits of the bootstrap rounds.

n_boot

the number of bootstrap rounds to use.

seed

an optional random seed for the analysis. If NULL, a random seed derived from the current state of the PRNG is used.

progress

display a progress bar tracking the computation?

silent

be completely silent (except for error and warning messages)?

Details

Note: If you wish to use Messina to detect differential expression, and not construct classifiers, you may find the messinaDE function to be a more convenient interface.

Messina constructs single-feature threshold classifiers (see below) to separate two sample groups, that are in a sense the most robust single-gene classifiers that satisfy user-supplied performance requirements. It accepts as primary input a matrix or ExpressionSet of feature data x; a vector of sample class membership y; and minimum classifier target performance values min_sens, and min_spec. Messina then examines each feature of x in turn, and attempts to build a threshold classifier that satisfies the minimum performance requirements, based on that feature. The results of this classifier training and testing are then returned in a MessinaClassResult object.

The features measured in x must be numeric and contain no missing values, but apart from that are unrestricted – common use cases are mRNA measurements and protein abundance estimates. Messina is not sensitive to the data transformation used, although for mRNA abundance measurements a log-transform or similar is suggested to aid interpretability of the results. x containing discrete values can also be examined by Messina, though if the number of possible values of the members of x is very low, the algorithm is unlikely to be very powerful.

Value

an object of class "MessinaClassResult" containing the results of the analysis.

Threshold classifiers

Messina trains single-feature threshold classifiers. These are classifiers that place unknown samples into one of two groups, based on whether the sample's measurement for a given feature is above or below a constant threshold value. They are the one-dimensional version of support vector machines (SVMs), where in this case the feature set is one-dimensional, and the 'support vector' (the threshold) is a zero-dimensional point. Threshold classifiers are defined by two properties: their threshold value, and their direction, which is the class assigned if a sample's measurement exceeds the threshold.

Author(s)

Mark Pinese m.pinese@garvan.org.au

References

Pinese M, Scarlett CJ, Kench JG, et al. (2009) Messina: A Novel Analysis Tool to Identify Biologically Relevant Molecules in Disease. PLoS ONE 4(4): e5337. doi:10.1371/journal.pone.0005337

See Also

MessinaClassResult-class

ExpressionSet

messinaDE

messinaSurv

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Load some example data
library(antiProfilesData)
data(apColonData)

x = exprs(apColonData)
y = pData(apColonData)$SubType

## Subset the data to only tumour and normal samples
sel = y %in% c("normal", "tumor")
x = x[,sel]
y = y[sel]

## Run Messina to rank probesets on their classification ability, with
## classifiers needing to meet a minimum sensitivity of 0.95, and minimum
## specificity of 0.85.
fit = messina(x, y == "tumor", min_sens = 0.95, min_spec = 0.85)

## Display the results.
fit
plot(fit)

messina documentation built on Nov. 8, 2020, 7:47 p.m.