goseq: Perform goseq Enrichment tests across a GeneSetDb.

Description Usage Arguments Value References

View source: R/do.goseq.R

Description

Note that we do not import things from goseq directly, and only load it if this function is fired. I can't figure out a way to selectively import functions from the goseq package without it having to load its dependencies, which take a long time – and I don't want loading multiGSEA to take a long time. So, the goseq package has moved to Suggests and then is loaded within this function when necessary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
goseq(
  gsd,
  selected,
  universe,
  feature.bias,
  method = c("Wallenius", "Sampling", "Hypergeometric"),
  repcnt = 2000,
  use_genes_without_cat = TRUE,
  plot.fit = FALSE,
  do.conform = TRUE,
  as.dt = FALSE,
  .pipelined = FALSE
)

Arguments

gsd

The GeneSetDb object to run tests against

selected

The ids of the selected features

universe

The ids of the universe

feature.bias

a named vector as long as nrow(x) that has the "bias" information for the features/genes tested (ie. vector of gene lengths). names(feature.bias) should equal rownames(x). If this is not provided, all feature lengths are set to 1 (no bias). The goseq package provides a getlength function which facilitates getting default values for these if you do not have the correct values used in your analysis.

method

The method to use to calculate the unbiased category enrichment scores

repcnt

Number of random samples to be calculated when random sampling is used. Ignored unless method="Sampling".

use_genes_without_cat

A boolean to indicate whether genes without a categorie should still be used. For example, a large number of gene may have no GO term annotated. If this option is set to FALSE, those genes will be ignored in the calculation of p-values (default behaviour). If this option is set to TRUE, then these genes will count towards the total number of genes outside the category being tested.

do.conform

By default TRUE: does some gymnastics to conform the gsd to the universe vector. This should neber be set to FALSE, but this parameter is here so that when this function is called from the multiGSEA codepath, we do not have to reconform the GeneSetDb object, because it has already been done.

.pipelined

If this is being external to a multiGSEA pipeline, then some additional cleanup of columns name output will be done. Otherwise the column renaming and post processing is left to the do.goseq caller (Default: FALSE).

active.only

If TRUE, only "active" genesets are used

value

The feature_id types to extract from gsd

Value

A data.table of results, similar to goseq output. The output from nullp is added to the outgoing data.table as an attribue named "pwf".

References

Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14. http://genomebiology.com/2010/11/2/R14


lianos/multiGSEA documentation built on Nov. 17, 2020, 1:26 p.m.