Description Usage Arguments Details Value References See Also Examples
This is the main user interface to the saps package, and is usually the only function needed.
1 2 3 |
candidateGeneSets |
A matrix with at least one row, where each row represents
a gene set, and the column values are gene identifiers. The row names should contain
unique names for the gene sets. The column values may contain |
dataSet |
A matrix, where the column names are gene identifiers
(in the same format as the values in |
survivalTimes |
A vector of survival times. The length must equal the number of
rows (i.e. patients) in |
followup |
A vector of 0 or 1 values, indicating whether the patient was
lost to followup (0) or not (1). The length must equal the number of rows
(i.e. patients) in |
random.samples |
An integer that specifies how many random gene sets to sample when computing P_random. Defaults to 1000. |
cpus |
An integer that specifies the number of cpus/cores to be used when calculating P_enrichment. If greater than 1 (the default), the snowfall package must be installed or an error will occur. |
gsea.perm |
The number of permutations to be used when calculating
p_enrich. This is passed to the |
compute_qvalue |
A boolean indicating whether to include calculation
of the saps q_value. Setting this to |
qvalue.samples |
An integer that specifies how many random gene sets to sample when computing the saps q_value. Defaults to 1000. |
verbose |
A boolean indicating whether to display status messages during
computation. Defaults to |
saps provides a robust method for identifying biologically significant gene sets associated with patient survival. Three basic statistics are computed. First, patients are clustered into two survival groups based on differential expression of a candidate gene set. p_pure is calculated as the probability of no survival difference between the two groups.
Next, the same procedure is applied to randomly generated gene sets, and p_random is calculated as the proportion achieving a p_pure as significant as the candidate gene set. Finally, a pre-ranked Gene Set Enrichment Analysis (GSEA) is performed by ranking all genes by concordance index, and p_enrich is computed to indicate the degree to which the candidate gene set is enriched for genes with univariate prognostic significance.
A saps_score is calculated to summarize the three statistics, and optionally a saps_qvalue is computed to estimate the significance of the saps_score by calculating the saps_score for random gene sets.
The function returns a list with the following elements:
rankedGenes |
Vector of concordance index z-scores for the genes in
|
geneset.count |
The number of gene sets analyzed. |
genesets |
A list of genesets (see below). |
saps_table |
A dataframe summarizing the adjusted and unadjusted
saps statistics for each geneset analyzed. The dataframe contains
the following columns: |
genesets
is in turn a list with the following elements:
name |
The name of the geneset. |
size |
The number of genes in the geneset. |
genes |
Vector of gene labels for this geneset. |
saps_unadjusted |
Vector with elements |
saps_adjusted |
Vector with elements |
cluster |
Vector of assigned cluster (1 or 2) for each patient using this candidate geneset. |
random_p_pures |
Vector of p_pure values for each random geneset generated during the computation of p_random. |
random_saps_scores |
Vector of saps_score values for each random geneset generated during the computation of saps_qvalue. |
direction |
Direction (-1 or 1) of the enrichment association for this geneset. |
Beck AH, Knoblauch NW, Hefti MM, Kaplan J, Schnitt SJ, et al. (2013) Significance Analysis of Prognostic Signatures. PLoS Comput Biol 9(1): e1002875.doi:10.1371/journal.pcbi.1002875
survdiff
concordance.index
runGSA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # 25 patients, none lost to followup
followup <- rep(1, 25)
# first 5 patients have good survival (in days)
time <- c(25, 27, 24, 21, 26, sample(1:3, 20, TRUE))*365
# create data for 100 genes, 25 patients
dat <- matrix(rnorm(25*100), nrow=25, ncol=100)
colnames(dat) <- as.character(1:100)
# create two random genesets of 5 genes each
set1 <- sample(colnames(dat), 5)
set2 <- sample(colnames(dat), 5)
genesets <- rbind(set1, set2)
# compute saps
results <- saps(genesets, dat, time, followup, random.samples=100)
# check results
saps_table <- results$saps_table
saps_table[1:7]
# increase expression levels for set1 for first 5 patients
dat[1:5, set1] <- dat[1:5, set1]+10
# run again, should get significant values for set1
results <- saps(genesets, dat, time, followup, random.samples=100)
# check results
saps_table <- results$saps_table
saps_table[1:7]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.