saps: Compute SAPS statistics
In schmolze/saps-devel: Significance Analysis of Prognostic Signatures

Description Usage Arguments Details Value References See Also Examples

This is the main user interface to the saps package, and is usually the only function needed.

1
2
3

saps(candidateGeneSets, dataSet, survivalTimes, followup,
  random.samples = 1000, cpus = 1, gsea.perm = 1000,
  compute_qvalue = FALSE, qvalue.samples = 1000, verbose = TRUE)

`candidateGeneSets`	A matrix with at least one row, where each row represents a gene set, and the column values are gene identifiers. The row names should contain unique names for the gene sets. The column values may contain `NA` values, since in general gene sets will have differing lengths.
`dataSet`	A matrix, where the column names are gene identifiers (in the same format as the values in `candidateGeneSets`) and the values are gene expression levels. Each row should contain data for a single patient.
`survivalTimes`	A vector of survival times. The length must equal the number of rows (i.e. patients) in `dataSet`.
`followup`	A vector of 0 or 1 values, indicating whether the patient was lost to followup (0) or not (1). The length must equal the number of rows (i.e. patients) in `dataSet`.
`random.samples`	An integer that specifies how many random gene sets to sample when computing P_random. Defaults to 1000.
`cpus`	An integer that specifies the number of cpus/cores to be used when calculating P_enrichment. If greater than 1 (the default), the snowfall package must be installed or an error will occur.
`gsea.perm`	The number of permutations to be used when calculating p_enrich. This is passed to the `runGSA` function in the piano package. Defaults to 1000.
`compute_qvalue`	A boolean indicating whether to include calculation of the saps q_value. Setting this to `TRUE` will significantly increase the computational time.
`qvalue.samples`	An integer that specifies how many random gene sets to sample when computing the saps q_value. Defaults to 1000.
`verbose`	A boolean indicating whether to display status messages during computation. Defaults to `TRUE`.

saps provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. p_pure is calculated as the probability of no survival difference between the two groups.

Next, the same procedure is applied to randomly generated gene sets, and p_random is calculated as the proportion achieving a p_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and p_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance.

A saps_score is calculated to summarize the three statistics, and optionally a saps_qvalue is computed to estimate the significance of the saps_score by calculating the saps_score for random gene sets.

The function returns a list with the following elements:

`rankedGenes`	Vector of concordance index z-scores for the genes in `dataSet`, named by gene identifier.
`geneset.count`	The number of gene sets analyzed.
`genesets`	A list of genesets (see below).
`saps_table`	A dataframe summarizing the adjusted and unadjusted saps statistics for each geneset analyzed. The dataframe contains the following columns: `size, p_pure, p_random, p_enrich, direction, saps_score, saps_qvalue, p_pure_adj, p_random_adj, p_enrich_adj, saps_score_adj, saps_qvalue_adj`. Each row summarizes a single geneset. Note that the saps statistics are stored with each individual `geneset` as well; this table is provided simply for convenience.

genesets is in turn a list with the following elements:

`name`	The name of the geneset.
`size`	The number of genes in the geneset.
`genes`	Vector of gene labels for this geneset.
`saps_unadjusted`	Vector with elements `p_pure`, `p_random`, `p_enrich`, `saps_score`, and `saps_qvalue` containing the respective unadjusted p-values.
`saps_adjusted`	Vector with elements `p_pure`, `p_random`, `p_enrich`, `saps_score`, and `saps_qvalue` containing the respective p-values adjusted for multiple comparisons.
`cluster`	Vector of assigned cluster (1 or 2) for each patient using this candidate geneset.
`random_p_pures`	Vector of p_pure values for each random geneset generated during the computation of p_random.
`random_saps_scores`	Vector of saps_score values for each random geneset generated during the computation of saps_qvalue.
`direction`	Direction (-1 or 1) of the enrichment association for this geneset.

Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875

survdiff concordance.index runGSA

# 25 patients, none lost to followup
followup <- rep(1, 25)

# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365

# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)

# create two random genesets of 5 genes each
set1 <- sample(colnames(dat), 5)
set2 <- sample(colnames(dat), 5)

genesets <- rbind(set1, set2)

# compute saps
results <- saps(genesets, dat, time, followup, random.samples=100)

# check results
saps_table <- results$saps_table
saps_table[1:7]

# increase expression levels for set1 for first 5 patients
dat[1:5, set1] <- dat[1:5, set1]+10

# run again, should get significant values for set1
results <- saps(genesets, dat, time, followup, random.samples=100)

# check results
saps_table <- results$saps_table
saps_table[1:7]