saps: Compute SAPS statistics

Description Usage Arguments Details Value References See Also Examples

Description

This is the main user interface to the saps package, and is usually the only function needed.

Usage

1
2
3
saps(candidateGeneSets, dataSet, survivalTimes, followup,
  random.samples = 1000, cpus = 1, gsea.perm = 1000,
  compute_qvalue = FALSE, qvalue.samples = 1000, verbose = TRUE)

Arguments

candidateGeneSets

A matrix with at least one row, where each row represents a gene set, and the column values are gene identifiers. The row names should contain unique names for the gene sets. The column values may contain NA values, since in general gene sets will have differing lengths.

dataSet

A matrix, where the column names are gene identifiers (in the same format as the values in candidateGeneSets) and the values are gene expression levels. Each row should contain data for a single patient.

survivalTimes

A vector of survival times. The length must equal the number of rows (i.e. patients) in dataSet.

followup

A vector of 0 or 1 values, indicating whether the patient was lost to followup (0) or not (1). The length must equal the number of rows (i.e. patients) in dataSet.

random.samples

An integer that specifies how many random gene sets to sample when computing P_random. Defaults to 1000.

cpus

An integer that specifies the number of cpus/cores to be used when calculating P_enrichment. If greater than 1 (the default), the snowfall package must be installed or an error will occur.

gsea.perm

The number of permutations to be used when calculating p_enrich. This is passed to the runGSA function in the piano package. Defaults to 1000.

compute_qvalue

A boolean indicating whether to include calculation of the saps q_value. Setting this to TRUE will significantly increase the computational time.

qvalue.samples

An integer that specifies how many random gene sets to sample when computing the saps q_value. Defaults to 1000.

verbose

A boolean indicating whether to display status messages during computation. Defaults to TRUE.

Details

saps provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. p_pure is calculated as the probability of no survival difference between the two groups.

Next, the same procedure is applied to randomly generated gene sets, and p_random is calculated as the proportion achieving a p_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and p_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance.

A saps_score is calculated to summarize the three statistics, and optionally a saps_qvalue is computed to estimate the significance of the saps_score by calculating the saps_score for random gene sets.

Value

The function returns a list with the following elements:

rankedGenes

Vector of concordance index z-scores for the genes in dataSet, named by gene identifier.

geneset.count

The number of gene sets analyzed.

genesets

A list of genesets (see below).

saps_table

A dataframe summarizing the adjusted and unadjusted saps statistics for each geneset analyzed. The dataframe contains the following columns: size, p_pure, p_random, p_enrich, direction, saps_score, saps_qvalue, p_pure_adj, p_random_adj, p_enrich_adj, saps_score_adj, saps_qvalue_adj. Each row summarizes a single geneset. Note that the saps statistics are stored with each individual geneset as well; this table is provided simply for convenience.

genesets is in turn a list with the following elements:

name

The name of the geneset.

size

The number of genes in the geneset.

genes

Vector of gene labels for this geneset.

saps_unadjusted

Vector with elements p_pure, p_random, p_enrich, saps_score, and saps_qvalue containing the respective unadjusted p-values.

saps_adjusted

Vector with elements p_pure, p_random, p_enrich, saps_score, and saps_qvalue containing the respective p-values adjusted for multiple comparisons.

cluster

Vector of assigned cluster (1 or 2) for each patient using this candidate geneset.

random_p_pures

Vector of p_pure values for each random geneset generated during the computation of p_random.

random_saps_scores

Vector of saps_score values for each random geneset generated during the computation of saps_qvalue.

direction

Direction (-1 or 1) of the enrichment association for this geneset.

References

Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875

See Also

survdiff concordance.index runGSA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 25 patients, none lost to followup
followup <- rep(1, 25)

# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365

# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)

# create two random genesets of 5 genes each
set1 <- sample(colnames(dat), 5)
set2 <- sample(colnames(dat), 5)

genesets <- rbind(set1, set2)

# compute saps
results <- saps(genesets, dat, time, followup, random.samples=100)

# check results
saps_table <- results$saps_table
saps_table[1:7]

# increase expression levels for set1 for first 5 patients
dat[1:5, set1] <- dat[1:5, set1]+10

# run again, should get significant values for set1
results <- saps(genesets, dat, time, followup, random.samples=100)

# check results
saps_table <- results$saps_table
saps_table[1:7]

saps documentation built on Oct. 5, 2016, 4:32 a.m.