runTBsigProfiler: Run TB gene signature profiling.
In dfjenkins3/TBSignatureProfiler: Profile RNA-Seq Data Using TB Pathway Signatures

runTBsigProfiler

R Documentation

Run TB gene signature profiling.

Description

Using some subset of the signatures listed in TBsignatures and specified scoring algorithms, this function runs gene signature profiling on an input gene expression dataset. It allows for scores to be computed for these signatures which can be compared using various visualization tools also provided in the TBSignatureProfiler package.

Usage

runTBsigProfiler(
  input,
  useAssay = NULL,
  signatures = NULL,
  algorithm = c("GSVA", "ssGSEA", "ASSIGN", "PLAGE", "Zscore", "singscore"),
  combineSigAndAlgorithm = FALSE,
  assignDir = NULL,
  outputFormat = NULL,
  parallel.sz = 0,
  ASSIGNiter = 1e+05,
  ASSIGNburnin = 50000,
  ssgsea_norm = TRUE,
  update_genes = TRUE
)

Arguments

`input`	an input data object of the class `SummarizedExperiment`, `data.frame`, or `matrix` containing gene expression data. Required.
`useAssay`	a character string specifying the assay to use for signature profiling when `input` is a `SummarizedExperiment`. Required only for input data of the class `SummarizedExperiment`. If null, the assay used will be `"counts"`. The default is `NULL.`
`signatures`	a `list` of signatures to run with their associated genes. This list should be in the same format as `TBsignatures`, included in the TBSignatureProfiler package. If `signatures = NULL`, the default set of signatures `TBsignatures` list is used. For details, run `?TBsignatures`. If <2 genes in a signature are present in the sample, that signature will not be evaluated and will not be present in the resulting SE object. The default is `NULL`.
`algorithm`	a vector of algorithms to run, or character string if only one is desired. The default is `c("GSVA", "ssGSEA", "ASSIGN", "PLAGE", "Zscore", "singscore")`. NOTE: ASSIGN takes a long time to run and is not recommended for efficient use.
`combineSigAndAlgorithm`	logical, if `TRUE`, output row names will be of the form _. It must be set to `TRUE` if the `ouputFormat` will be a SummarizedExperiment and `length(algorithm) > 1`. It will always be `FALSE` if only one algorithm is selected. If `FALSE`, there will be a column named algorithm' that lists which algorithm is used, and a column named 'pathway' that lists the signature profiled. If `NULL`, and one algorithm was used, the algorithm will not be listed. The default is `FALSE`.
`assignDir`	a character string naming a directory to save intermediate ASSIGN results if `algorithm` specifies `"ASSIGN"`. The default is `NULL`, in which case intermediate results will not be saved.
`outputFormat`	a character string specifying the output data format. Possible values are `"SummarizedExperiment"`, `"matrix"`, or `"data.frame"`. The default is to return the same type as the `input` object.
`parallel.sz`	an integer identifying the number of processors to use when running the calculations in parallel for the GSVA and ssGSEA algorithms. If `parallel.sz = 0`, all cores are used. The default is `0`.
`ASSIGNiter`	an integer indicating the number of iterations to use in the MCMC for the ASSIGN algorithm. The default is `100,000`.
`ASSIGNburnin`	an integer indicating the number of burn-in iterations to use in the MCMC for the ASSIGN algorithm. These iterations are discarded when computing the posterior means of the model parameters. The default is `50,000`.
`ssgsea_norm`	logical, passed to `GSVA::gsva()`. When parameter `algorithm = "ssgsea"`,the profiler runs the SSGSEA method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. When `ssgsea.norm = FALSE`, this last normalization step is skipped. The default is `TRUE`.
`update_genes`	logical, denotes whether gene names from `signatures` and the rownames of `input` should be checked for accuracy using `HGNChelper::checkGeneSymbols()`. The mapping assumes genes are from humans and will keep unmapped genes as the original input gene name. Default is `TRUE`.

Value

A SummarizedExperiment object, data.frame, or matrix of signature profiling results. The returned object will be of the format specified in outputFormat. If input is a SummarizedExperiment and outputFormat = "SummarizedExperiment", then the output will retain any input information stored in the input colData. In general, if outputFormat = "SummarizedExperiment" then columns in the colData will include the scores for each desired signature with samples on the rows. If input is a data.frame or matrix, then the returned object will have signatures on the rows and samples on the columns.

Source

Profiling for the Z-Score, PLAGE, GSVA, ssGSEA algorithms are all conducted with the Bioconductor GSVA package. Profiling for the singscore algorithm is conducted with the Bioconductor singscore package.

References

Barbie, D.A., Tamayo, P., Boehm, J.S., Kim, S.Y., Moody, S.E., Dunn, I.F., Schinzel, A.C., Sandy, P., Meylan, E., Scholl, C., et al. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108-112. doi: 10.1038/nature08460.

Foroutan, M. et al. (2018). Single sample scoring of molecular phenotypes. BMC Bioinformatics, 19. doi: 10.1186/s12859-018-2435-4.

Lee, E. et al. (2008). Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217. doi: 10.1371/journal.pcbi.1000217

Shen, Y. et al. (2015). ASSIGN: context-specific genomic profiling of multiple heterogeneous biological pathways. Bioinformatics, 31, 1745-1753. doi: 10.1093/bioinformatics/btv031.

Subramanian, A. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102, 15545-15550. doi: 10.1073/pnas.0506580102.

Tomfohr, J. et al. (2005). Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225. doi: 10.1186/1471-2105-6-225

Examples

## Using a data.frame input/output
 # Create some toy data to test Zak_RISK_16 signature, using 5 samples with low
 # expression & five samples with high expression of the signatures genes.
df_testdata <- as.data.frame(rbind(matrix(c(rnorm(80), rnorm(80) + 5), 16, 10,
                             dimnames = list(TBsignatures$Zak_RISK_16,
                             paste0("sample", seq_len(10)))),
                      matrix(rnorm(1000), 100, 10,
                             dimnames = list(paste0("gene", seq_len(100)),
                             paste0("sample", seq_len(10))))))
res <- runTBsigProfiler(input = df_testdata,
                        signatures = TBsignatures["Zak_RISK_16"],
                        algorithm = c("GSVA", "ssGSEA"),
                        combineSigAndAlgorithm = FALSE,
                        parallel.sz = 1)
subset(res, res$pathway == "Zak_RISK_16")

## Using a SummarizedExperiment input/output
 # The TB_indian SummarizedExperiment data is included in the package.
GSVA_res <- runTBsigProfiler(input = TB_indian,
                             useAssay = "logcounts",
                             signatures = TBsignatures["Zak_RISK_16"],
                             algorithm = c("GSVA"),
                             combineSigAndAlgorithm = FALSE,
                             parallel.sz = 1)
GSVA_res$Zak_RISK_16

dfjenkins3/TBSignatureProfiler documentation built on June 2, 2025, 3:16 a.m.