ss_methods | R Documentation |
These are internal functions to compute single sample scores from a list of gene signatures in three different ways:
combined z-score (Lee et al., 2008);
single sample GSEA (Barbie et al., 2009);
singscore (Foroutan et al., 2018).
compute_ssgsea()
is called by hack_estimate()
whereas all the three
methods are called by hack_sig()
.
compute_zscore(expr_data, signatures)
compute_ssgsea(
expr_data,
signatures,
sample_norm = "raw",
rank_norm = "none",
alpha = 0.25
)
compute_singscore(expr_data, signatures, direction = "none")
expr_data |
A normalized gene expression matrix (or data frame) with gene symbols as row names and samples as columns. |
signatures |
A named list of gene signatures. |
sample_norm |
A character string specifying the type of normalization affecting the single sample GSEA scores. Can be one of:
|
rank_norm |
A character string specifying how gene expression ranks should be normalized in the single sample GSEA procedure. Valid choices are:
|
alpha |
A numeric scalar. Exponent in the running sum of the single sample GSEA
score calculation which weighs the gene ranks. Defaults to |
direction |
A character string specifying the singscore computation method depending on the direction of the signatures. Can be on of:
|
A tibble with one row for each sample in expr_data
, a column sample_id
indicating sample identifiers and one column for each input signature giving
single sample scores.
This section gives a brief explanation of how single sample scores are obtained from different methods.
Gene expression values are centered by their mean value and scaled by their standard deviation across samples for each gene (z-scores). Then, for each sample and signature, corresponding z-scores are added up and divided by the square root of the signature size (i.e. the number of genes composing a signature).
The combined z-score method is also implemented in the R package GSVA
(Hänzelmann et al., 2013).
For each sample, genes are ranked by expression value in increasing order and
rank normalization may follow (see argument rank_norm
). Then, two probability-like
vectors are computed for each sample and signature:
P_{in}
, the cumulative sum of weighted ranks divided by their total
sum for genes in the signature;
P_{out}
, the cumulative sum of ones (indicating genes not in the signature)
divided by the number of genes not in the signature.
The single sample GSEA score is obtained by adding up the elements of the
vector difference P_{in} - P_{out}
.
Finally, single sample scores could be normalized either across samples or across
gene signatures and samples.
The single sample GSEA method is also implemented in the R package GSVA
(Hänzelmann et al., 2013).
For signatures whose genes are supposed to be up- or down-regulated, genes are ranked by expression value in increasing or decreasing order, respectively. For signatures whose direction is unknown, genes are ranked by absolute expression in increasing order and are median-centered. Enrichment scores are then computed for each sample and signature by averaging gene ranks for genes in the signature. Finally, normalized scores are obtained by subtracting the theoretical minimum mean rank from the score and dividing by the difference between the theoretical maximum and minimum mean ranks.
The hacksig
implementation of this method works only with unidirectional (i.e.
all genes up- or down-regulated) and undirected gene signatures.
If you want to get single sample scores for bidirectional gene signatures (i.e.
signatures composed of both up- and down-regulated genes), please use the R
package singscore
(Foroutan et al., 2018).
Barbie, D. A., Tamayo, P., Boehm, J. S., Kim, S. Y., Moody, S. E., Dunn, I. F., Schinzel, A. C., Sandy, P., Meylan, E., Scholl, C., Fröhling, S., Chan, E. M., Sos, M. L., Michel, K., Mermel, C., Silver, S. J., Weir, B. A., Reiling, J. H., Sheng, Q., Gupta, P. B., … Hahn, W. C. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(7269), 108–112. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1038/nature08460")}.
Foroutan, M., Bhuva, D. D., Lyu, R., Horan, K., Cursons, J., & Davis, M. J. (2018). Single sample scoring of molecular phenotypes. BMC bioinformatics, 19(1), 404. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/s12859-018-2435-4")}.
Hänzelmann, S., Castelo, R., & Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-seq data. BMC bioinformatics, 14, 7. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/1471-2105-14-7")}.
Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T., & Lee, D. (2008). Inferring pathway activity toward precise disease classification. PLoS computational biology, 4(11), e1000217. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pcbi.1000217")}.
hack_sig()
, hack_estimate()
, GSVA::gsva()
, singscore::multiScore()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.