SigCheck-package: Check a gene signature's survival and/or classification...

Description Details Author(s) References


While gene signatures are frequently used to predict phenotypes (e.g. predict prognosis of cancer patients), it it not always clear how optimal or meaningful they are (cf David Venet, Jacques E. Dumont, and Vincent Detours' paper "Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome"). Based on suggestions in that paper, SigCheck accepts a data set (as an ExpressionSet) and a gene signature, and compares its performance on survival and/or classification tasks against a) random gene signatures of the same length; b) known, related and unrelated gene signatures; and c) permuted data and/or metadata.


Package: SigCheck
Type: Package
Version: 2.0
Date: 2015-05-18
License: Artistic-2.0

To use SigCheck, first create anew SigCheck object using the function sigCheck. This will establish the baseline performance of the signature. Next, either run specific checks, or use the high level function sigCheckAll to run all the core functions in turn. The three core functions enable 1) comparison of baseline performance against signatures composed of random genes (sigCheckRandom); 2) comparison of baseline performance against known, and mostly unrelated, gene signatures (sigCheckKnown); and 3) comparison of baseline performance against randomly permuted data and/or metadata (sigCheckPermuted).

At a minimum, SigCheck requires a data set (as an ExpressionSet), metadata indicating the membership of each sample in one of two classes, and a signature (a subset of features in the ExpressionSet). If survival data are available, survival analyses are carried out. Validation samples can be divided into two classes using one of the simple default methods (based on overall expression value or their first principal component). Alternatively, more sophisticated classification algorithms can be deployed, using the MLearn function from the MLInterfaces package to build a classifier (using link{smvI} by default). If no validation samples are specified, leave-one-out (LOO) cross-validation is utilized to build multiple classifiers, each predicting one sample. If no survival data are provided, signatures are evaluated based on classification performance.

Output of each check includes the distribution of random performance scores (either survival p-value or classification performance) and the ranking of the passed signature within this distribution. An empirical p-value calculation based on this rank is also returned indicating confidence that the performance of the signature being checked has unique power.


First version written by Justin Norden with Rory Stark at the University of Cambridge, Cancer Resaerch UK Cambridge Institute.

Second version, including all survival analysis, written by Rory Stark at CRUK-CI.

Maintainer: Rory Stark <[email protected]>


Venet, David, Jacques E. Dumont, and Vincent Detours. "Most random gene expression signatures are significantly associated with breast cancer outcome." PLoS Computational Biology 7.10 (2011): e1002240.

SigCheck documentation built on May 6, 2019, 3:01 a.m.