multiGSEA: Performs a plethora of GSEA analyses over a contrast of...

Description Usage Arguments Details Value GSEA Methods GSEA Method Parameterization Differential Gene Expression Examples

View source: R/multiGSEA.R

Description

multiGSEA is wrapper function that delegates GSEA analyses to different "workers", each of which implements the flavor of GSEA of your choosing. The particular analyses that are performed are specified by the methods argument, and these methods are fine tuned by passing their arguments down through the ... of this wrapper function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
multiGSEA(
  gsd,
  x,
  design = NULL,
  contrast = NULL,
  methods = NULL,
  use.treat = FALSE,
  feature.min.logFC = if (use.treat) log2(1.25) else 1,
  feature.max.padj = 0.1,
  trim = 0.1,
  verbose = FALSE,
  ...,
  rank_by = NULL,
  select_by = NULL,
  rank_order = c("ordered", "descending", "ascending"),
  group_by = NULL,
  biased_by = NULL,
  xmeta. = NULL,
  .parallel = FALSE,
  BPPARAM = bpparam()
)

Arguments

gsd

The GeneSetDb() that defines the gene sets of interest.

x

An ExpressoinSet-like object

design

A design matrix for the study

contrast

The contrast of interest to analyze. This can be a column name of design, or a contrast vector which performs "coefficient arithmetic" over the columns of design. The design and contrast parameters are interpreted in exactly the same way as the same parameters in limma's limma::camera() and limma::roast() methods.

methods

A character vector indicating the GSEA methods you want to run. Refer to the GSEA Methods section for more details. If no methods are specified, only differential gene expression and geneset level statistics for the contrast are computed.

use.treat

should we use limma/edgeR's "treat" functionality for the gene-level differential expression analysis?

feature.min.logFC

The minimum logFC required for an individual feature (not geneset) to be considered differentialy expressed. Used in conjunction with feature.max.padj primarily for summarization of genesets (by geneSetsStats(), but can also be used by GSEA methods that require differential expression calls at the individual feature level, like goseq().

feature.max.padj

The maximum adjusted pvalue used to consider an individual feature (not geneset) to be differentially expressed. Used in conjunction with feature.min.logFC.

trim

The amount to trim when calculated trimmed t and logFC statistics for each geneset. This is passed down to the geneSetsStats() function.

verbose

make some noise during execution?

...

The arguments are passed down into calculateIndividualLogFC() and the various geneset analysis functions.

.parallel

by default, .parallel=FALSE runs each GSEA in a serial manner. If .parallel=TRUE, the GSEA execution loop is parallelized using the BiocParallel package. Note that you might want to remove unnecessary large objects from your workspace when this is TRUE because R will likely want to copy them down into your worker threads.

BPPARAM

a BiocParallel parameter definition, like one generated from BiocParallel::MulticoreParam(), or BiocParallel::BatchtoolsParam(), for instance, which is passed down to BiocParallel::bplapply()]. If not specified and .parallel = TRUE, then the BiocParallel::bpparam() object will be used. If .parallel = FALSE, this parameter is explicitly ignored and replaced with a BiocParallel::SerialParam()] object.

Details

Note that we are currently in the middle of a refactor to accept and fully take advantage of data.frame as inputs for x, which will be used for preranked type of GSEA methods. See the following issue for more details: https://github.com/lianos/multiGSEA/issues/24

The bulk of the GSEA methods currently available in this package come from edgeR/limma, however others are included (and are being added), as well. GSEA Methods and GSEA Method Parameterization sections for more details.

In addition to performing GSEA, this function also internally orchestrates a differential expression analysis, which can be tweaked by identifying the parameters in the calculateIndividualLogFC() function, and passing them down through ... here. The results of the differential expression analysis (ie. the limma::topTable()) are accessible by calling the logFC() function on the MultiGSEAResult() object returned from this function call.

Please Note: be sure to cite the original GSEA method when using results generated from this function.

Value

A MultiGSEAResult() which holds the results of all the analyses specified in the methods parameter.

GSEA Methods

You can choose the methods you would like to run by providing a character vector of GSEA method names to the methods parameter. Valid methods you can select from include:

Methods annotated with a (*) indicate that these methods require a complete expression object, a valid design matrix, and a contrast specification in order to run. These are all of the same things you need to provide when performing a vanilla differential gene expression analysis.

Methods missing a (*) can be run on a feature-named input vector of gene level statistics which will be used for ranking (ie. a named vector of logFC's or t-statistics for genes). They can also be run by providing an expression, design, and contrast vector, and the appropriate statistics vector will be generated internally from the t-statistics, p-values, or log-fold-changes, depending on the value provided in the score.by parameter.

The worker functions that execute these GSEA methods are functions named do.METHOD within this package. These functions are not meant to be executed directly by the user, and are therefore not exported. Look at the respective method's help page (ie. if you are running "camera", look at the limma::camera() help page for full details. The formal parameters that these methods take can be passed to them via the ... in this multiGSEA() function.

GSEA Method Parameterization

Each GSEA method can be tweaked via a custom set of parameters. We leave the documentation of these parameters and how they affect their respective GSEA methods to the documentation available in the packages where they are defined. The multiGSEA call simply has to pass these parameters down into the ... parameters here. The multiGSEA function will then pass these along to their worker functions.

What happens when two different GSEA methods have parameters with the same name?

Unfortunately you currently cannot provide different values for these parameters. An upcoming version version of multiGSEA will support this feature via slightly different calling semantics. This will also allow the caller to call the same GSEA method with different parameterizations so that even these can be compared against each other.

Differential Gene Expression

When the multiGSEA() call is given an expression matrix, design, and contrast, it will also internally orchestrate a gene level differential expression analysis. Depending on the type of expression object passed in via x, this function will guess on the best method to use for this analysis.

If x is a DGEList, then ensure that you have already called edgeR::estimateDisp() on x and edgeR's quasilikelihood framework will be used, otherwise we'll use limma (note that x can be an EList run through voom(), voomWithQuailityWeights(), or when where you have leveraged limma's limma::duplicateCorrelation() functionality, even.

The parameters of this differential expression analysis can also be customized. Please refer to the calculateIndividualLogFC() function for more information. The multiGSEA use.treat, feature.min.logFC, feature.max.padj, as well as the ... parameters are passed down to that funciton.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
vm <- exampleExpressionSet()
gdb <- exampleGeneSetDb()
mg <- multiGSEA(gdb, vm, vm$design, 'tumor',
                methods=c('camera', 'fry'),
                # customzie camera parameter:
                inter.gene.cor = 0.04)
resultNames(mg)
res.camera <- result(mg, 'camera')
res.fry <- result(mg, 'fry')
res.all <- results(mg)

lianos/multiGSEA documentation built on Nov. 17, 2020, 1:26 p.m.