flexgsea: Flexible Gene Set Enrichment Analysis.

Description Usage Arguments Details Value Possible values for return_values User-defined gene scoring function gene.score.fn User-defined gene set enrichment function es.fn User-defined significance calculation sig.fun See Also

View source: R/flexgsea.R

Description

flexgsea() does a gene set enrichment analysis, calculating significance by sample permutation. Functions to score genes, calculate enrichment statistic (ES), or calculate significance can be user defined and several options are supplied in the flexgsea package.

Usage

1
2
3
4
5
flexgsea(x, y, gene.sets, gene.score.fn = flexgsea_s2n,
  es.fn = flexgsea_weighted_ks, sig.fun = flexgsea_calc_sig,
  gene.names = NULL, nperm = 1000, gs.size.min = 10,
  gs.size.max = 300, verbose = TRUE, block.size = 100,
  parallel = NULL, abs = FALSE, return_values = character())

Arguments

x

Gene expression matrix (samples by genes), or EList object produced by, for example, limma::voom.

y

Classes or other response variables to analyse for gene set enrichment. Vector with length of the number of features, or sample by variable matrix.

gene.sets

Gene sets. Either a filename of a gmt file, or gene sets read by the read_gmt function.

gene.score.fn

Function to calculate gene scores. The signal to noise ratio (flexgsea_s2n) is appropriate for comparing two classes. Correlation (flexgsea_lm) can be used for real valued variables. Can be user-defined, as documented below.

es.fn

Function to calculate enrichment scores (ES). Default is the weighted KS statistic by Subramanian et al (2005). Can be user-defined, as documented below.

sig.fun

Function to calculate significance of results. Using flexgsea_calc_sig_simple is recommended for a es.fn function other than the default flexgsea_weighted_ks as the default might not be appropriate. Can be user-defined, as documented below.

gene.names

Gene identifiers for the genes in the data x that match the identifiers in gene.sets. Defaults to the the row names of x.

nperm

Number of permutations to run.

gs.size.min

Minimum number genes in a gene set that are also in x for a gene set to be included in the analysis.

gs.size.max

Maximum number genes in a gene set that are also in x for a gene set to be included in the analysis.

verbose

Should progress be printed. Progress is never printed when running in parallel.

block.size

Number of permutations for which gene scoring and calculation of enrichment statistic is done in one batch. One batch can use only one thread, so this setting also effects parallel processing. Lower values use less memory, but might lose performance.

parallel

Should computation be done in parallel.

abs

Should the absolute enrichment score be used. This appropriate when gene sets have no direction, such as the MsigDB c2.cp gene set collection.

return_values

Character vector of values to be returned other than table with statistics. Possible values are documented below, and with the enrichment function used.

Details

Gene sets are filtered. First, only genes which exist in the data set x are kept. Then, gene sets smaller than gs.size.min or larger than gs.size.max are filtered out.

Runs in parallel by default if foreach environment is setup and block.size is smaller than the number of permutations.

Value

A list. The table element is a list with a data frame of enrichment statistics for each response variable in y. Other elements are the values requested in return_values.

Possible values for return_values

es_null:

Null distribution of ES.

gene_names:

Gene names, as supplied to this function.

Additional return values might be available when using specific gene set enrichment functions.

User-defined gene scoring function gene.score.fn

A gene score calculation function should take the following arguments:

x:

The data matrix x, exactly as given to the gsea function.

y:

Response variables to test for gene set enrichment. The y given to the gsea function or a permutation of y. This is a matrix with samples in the rows, and output variables in the columns.

It should return a matrix with samples in the columns and genes in the rows.

x:

The data matrix x, exactly as given to the gsea function.

y:

Response variables to test for gene set enrichment. A permutation of the y given to the gsea function. This is a matrix with samples in the rows, and output variables in the columns.

A simple example is flexgsea_lm.

User-defined gene set enrichment function es.fn

A list of two functions (prepare and run) and two character vectors (extra_stats and extra). The codeprepare function can be used to do calculations that are the same for all gene sets. It takes a single argument gene.score and can return anything, which is passed to the run function. This function can be called one or multiple times on any subset of permutations, so this function should not modify global state. The run function should take the following arguments:

gene.score

Gene scores of one or more permutations in an array (genes x response variable x permutation).

gene.set

Gene set as an integer vector which indexes the first dimension of the gene.score array.

prep

Whatever the prepare function returned for this gene.score.

return_stats

A character vector of statistics to return. This function can advertise which stats are available trough extra_stats in the list. Should default to c().

return

A character vector of other extra values to return. This function can advertise which values are available trough extra in the list. Should default to c().

It should return a list with es and any requested extra statistics and other values. The extra statistics are put into the results table, while the other extra values are added to the list returned by flexgsea. The es element should be a matrix (response x permutation). A simple example is flexgsea_mean.

User-defined significance calculation sig.fun

A significance calculation function should take the following arguments:

es:

Enrichment scores for a single output variable, a numeric vector with a length equal to the number of gene sets.

es_null:

Enrichment scores from permuted labels, a numeric array with dimensions number of gene sets by number of permutations.

verbose:

Passed from main flexgsea function.

abs:

Passed from main flexgsea function.

It should return a data frame with a row for every gene set, and a column for every statistic. This data frame is returned by the main flexgsea function in the table list after appending gene set names.

See Also

Gene scoring functions: flexgsea_s2n, flexgsea_lm.

Gene set enrichment functions: flexgsea_mean, flexgsea_weighted_ks, flexgsea_maxmean.

Functions for significance calculation: flexgsea_calc_sig,flexgsea_calc_sig_simple.


NKI-CCB/flexgsea-r documentation built on April 30, 2021, 5:35 p.m.