run_fgsea: Fast Gene Set Enrichment Analysis (FGSEA)

View source: R/statistic-fgsea.R

run_fgseaR Documentation

Fast Gene Set Enrichment Analysis (FGSEA)

Description

Calculates regulatory activities using FGSEA.

Usage

run_fgsea(
  mat,
  network,
  .source = source,
  .target = target,
  times = 100,
  nproc = availableCores(),
  seed = 42,
  minsize = 5,
  ...
)

Arguments

mat

Matrix to evaluate (e.g. expression matrix). Target nodes in rows and conditions in columns. rownames(mat) must have at least one intersection with the elements in network .target column.

network

Tibble or dataframe with edges and it's associated metadata.

.source

Column with source nodes.

.target

Column with target nodes.

times

How many permutations to do?

nproc

Number of cores to use for computation.

seed

A single value, interpreted as an integer, or NULL.

minsize

Integer indicating the minimum number of targets per source.

...

Arguments passed on to fgsea::fgseaMultilevel

sampleSize

The size of a random set of genes which in turn has size = pathwaySize

minSize

Minimal size of a gene set to test. All pathways below the threshold are excluded.

maxSize

Maximal size of a gene set to test. All pathways above the threshold are excluded.

eps

This parameter sets the boundary for calculating the p value.

scoreType

This parameter defines the GSEA score type. Possible options are ("std", "pos", "neg"). By default ("std") the enrichment score is computed as in the original GSEA. The "pos" and "neg" score types are intended to be used for one-tailed tests (i.e. when one is interested only in positive ("pos") or negateive ("neg") enrichment).

gseaParam

GSEA parameter value, all gene-level statis are raised to the power of 'gseaParam' before calculation of GSEA enrichment scores.

BPPARAM

Parallelization parameter used in bplapply. Can be used to specify cluster to run. If not initialized explicitly or by setting 'nproc' default value 'bpparam()' is used.

absEps

deprecated, use 'eps' parameter instead

Details

GSEA (Aravind et al., 2005) starts by transforming the input molecular readouts in mat to ranks for each sample. Then, an enrichment score fgsea is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. Finally, a normalized score norm_fgsea, can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations. The used implementation is taken from the package fgsea (Korotkevich et al., 2021).

Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.

Korotkevich G. et al. (2021) Fast gene set enrichment analysis. bioRxiv. DOI: https://doi.org/10.1101/060012.

Value

A long format tibble of the enrichment scores for each source across the samples. Resulting tibble contains the following columns:

  1. statistic: Indicates which method is associated with which score.

  2. source: Source nodes of network.

  3. condition: Condition representing each column of mat.

  4. score: Regulatory activity (enrichment score).

See Also

Other decoupleR statistics: decouple(), run_aucell(), run_gsva(), run_mdt(), run_mlm(), run_ora(), run_udt(), run_ulm(), run_viper(), run_wmean(), run_wsum()

Examples

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_fgsea(mat, net, minsize=0, nproc=1)

saezlab/decoupleR documentation built on Oct. 21, 2024, 8:47 a.m.