messina: Find optimal single feature classifiers
In messina: Single-gene classifiers and outlier-resistant detection of differential expression for two-group and survival problems

Description Usage Arguments Details Value Threshold classifiers Author(s) References See Also Examples

Run the Messina algorithm to find features (eg. genes) that optimally distinguish between two classes of samples, subject to minimum performance requirements.

1 2	messina(x, y, min_sens, min_spec, f_train = 0.9, n_boot = 50, seed = NULL, progress = TRUE, silent = FALSE)

`x`	feature expression values, either supplied as an ExpressionSet, or as an object that can be converted to a matrix by as.matrix. In the latter case, features should be in rows and samples in columns, with feature names taken from the rows of the object.
`y`	a binary vector (TRUE/FALSE or 1/0) of class membership information for each sample in x.
`min_sens`	the minimum acceptable sensitivity that a classifier separating the two groups of y must achieve.
`min_spec`	the minimum acceptable specificity that a classifier separating the two groups of y must achieve.
`f_train`	the fraction of samples to be used in the training splits of the bootstrap rounds.
`n_boot`	the number of bootstrap rounds to use.
`seed`	an optional random seed for the analysis. If NULL, a random seed derived from the current state of the PRNG is used.
`progress`	display a progress bar tracking the computation?
`silent`	be completely silent (except for error and warning messages)?

Note: If you wish to use Messina to detect differential expression, and not construct classifiers, you may find the messinaDE function to be a more convenient interface.

Messina constructs single-feature threshold classifiers (see below) to separate two sample groups, that are in a sense the most robust single-gene classifiers that satisfy user-supplied performance requirements. It accepts as primary input a matrix or ExpressionSet of feature data x; a vector of sample class membership y; and minimum classifier target performance values min_sens, and min_spec. Messina then examines each feature of x in turn, and attempts to build a threshold classifier that satisfies the minimum performance requirements, based on that feature. The results of this classifier training and testing are then returned in a MessinaClassResult object.

The features measured in x must be numeric and contain no missing values, but apart from that are unrestricted – common use cases are mRNA measurements and protein abundance estimates. Messina is not sensitive to the data transformation used, although for mRNA abundance measurements a log-transform or similar is suggested to aid interpretability of the results. x containing discrete values can also be examined by Messina, though if the number of possible values of the members of x is very low, the algorithm is unlikely to be very powerful.

an object of class "MessinaClassResult" containing the results of the analysis.

Messina trains single-feature threshold classifiers. These are classifiers that place unknown samples into one of two groups, based on whether the sample's measurement for a given feature is above or below a constant threshold value. They are the one-dimensional version of support vector machines (SVMs), where in this case the feature set is one-dimensional, and the 'support vector' (the threshold) is a zero-dimensional point. Threshold classifiers are defined by two properties: their threshold value, and their direction, which is the class assigned if a sample's measurement exceeds the threshold.

Mark Pinese m.pinese@garvan.org.au

Pinese M, Scarlett CJ, Kench JG, et al. (2009) Messina: A Novel Analysis Tool to Identify Biologically Relevant Molecules in Disease. PLoS ONE 4(4): e5337. doi:10.1371/journal.pone.0005337

MessinaClassResult-class

ExpressionSet

messinaDE

messinaSurv

## Load some example data
library(antiProfilesData)
data(apColonData)

x = exprs(apColonData)
y = pData(apColonData)$SubType

## Subset the data to only tumour and normal samples
sel = y %in% c("normal", "tumor")
x = x[,sel]
y = y[sel]

## Run Messina to rank probesets on their classification ability, with
## classifiers needing to meet a minimum sensitivity of 0.95, and minimum
## specificity of 0.85.
fit = messina(x, y == "tumor", min_sens = 0.95, min_spec = 0.85)

## Display the results.
fit
plot(fit)