Description Usage Arguments Details Value Possible values for return_values User-defined gene scoring function gene.score.fn User-defined gene set enrichment function es.fn User-defined significance calculation sig.fun See Also
flexgsea()
does a gene set enrichment analysis, calculating significance
by sample permutation. Functions to score genes, calculate enrichment
statistic (ES), or calculate significance can be user defined and several
options are supplied in the flexgsea package.
1 2 3 4 5 | flexgsea(x, y, gene.sets, gene.score.fn = flexgsea_s2n,
es.fn = flexgsea_weighted_ks, sig.fun = flexgsea_calc_sig,
gene.names = NULL, nperm = 1000, gs.size.min = 10,
gs.size.max = 300, verbose = TRUE, block.size = 100,
parallel = NULL, abs = FALSE, return_values = character())
|
x |
Gene expression matrix (samples by genes), or EList object
produced by, for example, |
y |
Classes or other response variables to analyse for gene set enrichment. Vector with length of the number of features, or sample by variable matrix. |
gene.sets |
Gene sets. Either a filename of a gmt file, or gene sets
read by the |
gene.score.fn |
Function to calculate gene scores. The signal to noise
ratio ( |
es.fn |
Function to calculate enrichment scores (ES). Default is the weighted KS statistic by Subramanian et al (2005). Can be user-defined, as documented below. |
sig.fun |
Function to calculate significance of results. Using
|
gene.names |
Gene identifiers for the genes in the data |
nperm |
Number of permutations to run. |
gs.size.min |
Minimum number genes in a gene set that are also in
|
gs.size.max |
Maximum number genes in a gene set that are also in
|
verbose |
Should progress be printed. Progress is never printed when running in parallel. |
block.size |
Number of permutations for which gene scoring and calculation of enrichment statistic is done in one batch. One batch can use only one thread, so this setting also effects parallel processing. Lower values use less memory, but might lose performance. |
parallel |
Should computation be done in parallel. |
abs |
Should the absolute enrichment score be used. This appropriate when gene sets have no direction, such as the MsigDB c2.cp gene set collection. |
return_values |
Character vector of values to be returned other than table with statistics. Possible values are documented below, and with the enrichment function used. |
Gene sets are filtered. First, only genes which exist in the data set x are kept. Then, gene sets smaller than gs.size.min or larger than gs.size.max are filtered out.
Runs in parallel by default if foreach environment is setup and block.size is smaller than the number of permutations.
A list. The table
element is a list with a data frame of
enrichment statistics for each response variable in y. Other
elements are the values requested in return_values.
es_null
:Null distribution of ES.
gene_names
:Gene names, as supplied to this function.
Additional return values might be available when using specific gene set enrichment functions.
gene.score.fn
A gene score calculation function should take the following arguments:
x
:The data matrix x
, exactly as given to the
gsea
function.
y
:Response variables to test for gene set enrichment.
The y
given to the gsea
function or a permutation of
y
.
This is a matrix with samples in the rows, and output variables in
the columns.
It should return a matrix with samples in the columns and genes in the rows.
x
:The data matrix x
, exactly as given to the
gsea
function.
y
:Response variables to test for gene set enrichment.
A permutation of the y
given to the gsea
function.
This is a matrix with samples in the rows, and output variables in
the columns.
A simple example is flexgsea_lm
.
es.fn
A list of two functions (prepare
and run
) and two character
vectors (extra_stats
and extra
). The codeprepare function
can be used to do calculations that are the same for all gene sets. It takes
a single argument gene.score
and can return anything, which is
passed to the run
function. This function can be called one or
multiple times on any subset of permutations, so this function should not
modify global state.
The run
function should take the following arguments:
Gene scores of one or more permutations in an array (genes x response variable x permutation).
Gene set as an integer vector which indexes the first
dimension of the gene.score
array.
Whatever the prepare
function returned for this
gene.score
.
A character vector of statistics to return. This
function can advertise which stats are available trough
extra_stats
in the list. Should default to c()
.
A character vector of other extra values to return. This
function can advertise which values are available trough
extra
in the list. Should default to c()
.
It should return a list with es
and any requested extra statistics
and other values. The extra statistics are put into the results table, while
the other extra values are added to the list returned by flexgsea
. The
es
element should be a matrix (response x permutation).
A simple example is flexgsea_mean
.
sig.fun
A significance calculation function should take the following arguments:
es
:Enrichment scores for a single output variable, a numeric vector with a length equal to the number of gene sets.
es_null
:Enrichment scores from permuted labels, a numeric array with dimensions number of gene sets by number of permutations.
verbose
:Passed from main flexgsea
function.
abs
:Passed from main flexgsea
function.
It should return a data frame with a row for every gene set, and a column
for every statistic. This data frame is returned by the main flexgsea
function in the table
list after appending gene set names.
Gene scoring functions: flexgsea_s2n
,
flexgsea_lm
.
Gene set enrichment functions: flexgsea_mean
,
flexgsea_weighted_ks
, flexgsea_maxmean
.
Functions for significance calculation:
flexgsea_calc_sig
,flexgsea_calc_sig_simple
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.