messinaSurv: Find optimal prognostic features using the Messina algorithm

Description Usage Arguments Details Value Objective functions Minimum group fraction Author(s) See Also Examples

View source: R/messinaSurv.R

Description

Run the MessinaSurv algorithm to find features (eg. genes) that can define groups of patients with very different survival times.

Usage

1
2
messinaSurv(x, y, obj_min, obj_func, min_group_frac = 0.1, f_train = 0.8,
  n_boot = 50, seed = NULL, parallel = NULL, silent = FALSE)

Arguments

x

feature expression values, either supplied as an ExpressionSet, or as an object that can be converted to a matrix by as.matrix. In the latter case, features should be in rows and samples in columns, with feature names taken from the rows of the object.

y

a Surv object containing survival times and censoring status for each

obj_min

the minimum acceptable value of the objective metric. The metric used is specified by the parameter obj_func.

obj_func

the metric function that measures the difference in survival between patients with feature values above, and below, the threshold. Valid values are "tau", "reltau", or "coxcoef"; see details for more information.

min_group_frac

the size of the smallest sample group that is allowed to be generated by thresholding, as a fraction of the total sample. The default value of 0.1 means that no thresholds will be selected that result in a sample split yielding a group of smaller than 10 the samples. A modest value of this parameter increases the stability of the "reltau" and "coxcoef" objectives, which tend to become unstable as the number of samples in a group becomes very low; see details.

f_train

the fraction of samples to be used in the training splits of the bootstrap rounds.

n_boot

the number of bootstrap rounds to use.

seed

an optional random seed for the analysis. If NULL, the R PRNG is used as-is.

parallel

should calculations be parallelized using the doMC framework? If NULL, parallel mode is used if the doMC library is loaded, and more than one core has been registered with registerDoMC(). Note that no progress bar is displayed in parallel mode.

silent

be completely silent (except for error and warning messages)?

Details

The MessinaSurv algorithm aims to identify features for which patients with high signal and patients with low signal have very different survival outcomes. This is achieved by definining an objective function which assigns a numerical value to how strongly the survival in two groups of patients differs, then assessing the value of this objective at different signal levels of each feature. Those features for which, at a given signal level, the objective function is consistently above a user-supplied minimum level, are selected by MessinaSurv as being single-feature survival predictors.

MessinaSurv has applications as an algorithm to identify features that are survival-related, as well as a principled method to identify threshold signal values to separate a cohort into poor- and good-prognosis subgroups. It can also be used as a feature filter, selecting and discretising survival-related features before they are input into a multivariate predictor.

Value

an object of class "MessinaSurvResult" containing the results of the analysis.

Objective functions

MessinaSurv uses the value of its objective function as a measure of the strength of the difference in survival of the two patient groups defined by the threshold. Three objective functions are currently defined:

"coxcoef"

The coefficient of a Cox proportional hazards fit to the model Surv ~ I(x > T), where x is the feature signal level, and T is the threshold being tested. Range is (-inf, inf), with a no-information value of 0; positive values indicate that the subgroup defined by signal above the threshold fails sooner.

"tau"

Kendall's tau for survival data, defined as (concordant + tied/2) / (concordant + discordant + tied), where concordant is the number of concordant group/survival pairs, discordant is the number of discordant group/survival pairs, and tied is the total number of tied pairs, counting both group and survival ties. Concordance is calculated expecting that samples with signal exceeding the threshold will fail sooner. Range is [0, 1], with a no-information value of 0.5. Note that the ties terms naturally penalize very high or low thresholds, and so this objective is inappropriate if somewhat unbalanced subgroups are expected to be present in the data.

"reltau"

tau, normalized to remove the ties penalty. Defined as agree / (agree + disagree). Range is [0, 1], with a no-information value of 0.5. Although the ties penalty of tau is removed, and this method is thus suitable for finding unbalanced subgroups, it is now unstable at extreme threshold values (as in these cases, agree + disagree -> 0). For this reason, min_group_frac must be set to a modest value when using "reltau", to preserve stability.

Methods "coxcoef" and "reltau" show instability for very high and low threshold values, and so should be used with an appropriate value of min_group_frac for stable fits. Method "tau" is stable to extreme threshold values, and therefore will tolerate min_group_frac = 0, however note that "tau" naturally penalizes small subgroups, and is therefore a poor choice unless you wish to find approximately equal-sized subgroups.

Minimum group fraction

The parameter min_group_frac limits the size of the smallest subgroups that messinaSurv can select. As the groups become smaller, the "reltau" and "coxcoef" objective functions become unstable, and can generate spurious results. These are seen on the diagnostics produced by the messina plot functions as very high objective values at very low and high threshold values. To control these results, set min_group_frac to a high enough value that the objective functions reliably fit. Generally, max(0.1, 10/N), where N is the total number of patients, is sufficient. Keep in mind that setting this parameter too high will limit messinaSurv's ability to identify small subsets of patients with dramatically different survival from the rest: the smallest subset that will be reliably identified is min_group_frac of patients.

Author(s)

Mark Pinese m.pinese@garvan.org.au

See Also

MessinaSurvResult-class

ExpressionSet

messina

messinaDE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Load a subset of the TCGA renal clear cell carcinoma data
## as an example.
data(tcga_kirc_example)

## Run the messinaSurv analysis on these data.  Use a tau
## objective, with a minimum performance of 0.6.  Note that
## messinaSurv analyses are very computationally-intensive,
## so in actual use multicore use with doMC and parallel = TRUE
## is strongly recommended.
fit = messinaSurv(kirc.exprs, kirc.surv, obj_func = "tau", obj_min = 0.6)

fit
plot(fit)

messina documentation built on Nov. 8, 2020, 7:47 p.m.