ss_methods: Single sample scoring methods

ss_methodsR Documentation

Single sample scoring methods

Description

These are internal functions to compute single sample scores from a list of gene signatures in three different ways:

  • combined z-score (Lee et al., 2008);

  • single sample GSEA (Barbie et al., 2009);

  • singscore (Foroutan et al., 2018).

compute_ssgsea() is called by hack_estimate() whereas all the three methods are called by hack_sig().

Usage

compute_zscore(expr_data, signatures)

compute_ssgsea(
  expr_data,
  signatures,
  sample_norm = "raw",
  rank_norm = "none",
  alpha = 0.25
)

compute_singscore(expr_data, signatures, direction = "none")

Arguments

expr_data

A normalized gene expression matrix (or data frame) with gene symbols as row names and samples as columns.

signatures

A named list of gene signatures.

sample_norm

A character string specifying the type of normalization affecting the single sample GSEA scores. Can be one of:

  • "raw" (default), obtain raw scores;

  • "separate", normalize raw scores in [0, 1] across samples for each signature separately.

  • "all", normalize raw scores both across samples and signatures.

rank_norm

A character string specifying how gene expression ranks should be normalized in the single sample GSEA procedure. Valid choices are:

  • "none" (default), no rank normalization;

  • "rank", ranks are multiplied by 10000 / nrow(expr_data);

  • "logrank", normalized ranks are logged.

alpha

A numeric scalar. Exponent in the running sum of the single sample GSEA score calculation which weighs the gene ranks. Defaults to \alpha = 0.25.

direction

A character string specifying the singscore computation method depending on the direction of the signatures. Can be on of:

  • "none" (default), undirected signatures, that is you do not know whether the genes are up- or down-regulated;

  • "up", all genes in the signature are supposed to be up-regulated;

  • "down", all genes in the signature are supposed to be down-regulated;

Value

A tibble with one row for each sample in expr_data, a column sample_id indicating sample identifiers and one column for each input signature giving single sample scores.

Algorithm

This section gives a brief explanation of how single sample scores are obtained from different methods.

Combined z-score

Gene expression values are centered by their mean value and scaled by their standard deviation across samples for each gene (z-scores). Then, for each sample and signature, corresponding z-scores are added up and divided by the square root of the signature size (i.e. the number of genes composing a signature).

The combined z-score method is also implemented in the R package GSVA (Hänzelmann et al., 2013).

Single sample GSEA

For each sample, genes are ranked by expression value in increasing order and rank normalization may follow (see argument rank_norm). Then, two probability-like vectors are computed for each sample and signature:

  • P_{in}, the cumulative sum of weighted ranks divided by their total sum for genes in the signature;

  • P_{out}, the cumulative sum of ones (indicating genes not in the signature) divided by the number of genes not in the signature.

The single sample GSEA score is obtained by adding up the elements of the vector difference P_{in} - P_{out}. Finally, single sample scores could be normalized either across samples or across gene signatures and samples.

The single sample GSEA method is also implemented in the R package GSVA (Hänzelmann et al., 2013).

Singscore

For signatures whose genes are supposed to be up- or down-regulated, genes are ranked by expression value in increasing or decreasing order, respectively. For signatures whose direction is unknown, genes are ranked by absolute expression in increasing order and are median-centered. Enrichment scores are then computed for each sample and signature by averaging gene ranks for genes in the signature. Finally, normalized scores are obtained by subtracting the theoretical minimum mean rank from the score and dividing by the difference between the theoretical maximum and minimum mean ranks.

The hacksig implementation of this method works only with unidirectional (i.e. all genes up- or down-regulated) and undirected gene signatures. If you want to get single sample scores for bidirectional gene signatures (i.e. signatures composed of both up- and down-regulated genes), please use the R package singscore (Foroutan et al., 2018).

References

Barbie, D. A., Tamayo, P., Boehm, J. S., Kim, S. Y., Moody, S. E., Dunn, I. F., Schinzel, A. C., Sandy, P., Meylan, E., Scholl, C., Fröhling, S., Chan, E. M., Sos, M. L., Michel, K., Mermel, C., Silver, S. J., Weir, B. A., Reiling, J. H., Sheng, Q., Gupta, P. B., … Hahn, W. C. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(7269), 108–112. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1038/nature08460")}.

Foroutan, M., Bhuva, D. D., Lyu, R., Horan, K., Cursons, J., & Davis, M. J. (2018). Single sample scoring of molecular phenotypes. BMC bioinformatics, 19(1), 404. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/s12859-018-2435-4")}.

Hänzelmann, S., Castelo, R., & Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-seq data. BMC bioinformatics, 14, 7. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/1471-2105-14-7")}.

Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T., & Lee, D. (2008). Inferring pathway activity toward precise disease classification. PLoS computational biology, 4(11), e1000217. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pcbi.1000217")}.

See Also

hack_sig(), hack_estimate(), GSVA::gsva(), singscore::multiScore()


Acare/hacksig documentation built on April 14, 2025, 6:18 a.m.