coexpression_pathway_enrichment: Enrichment of pathways (gene sets) in a co-expression...

Description Usage Arguments Details Value Examples

View source: R/coexpression_pathway_enrichment.R

Description

This function computes enrichment p-values of pathways in a co-expression network. See details.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
coexpression_pathway_enrichment(
  net,
  pathways,
  min.gene = 5,
  max.gene = 100,
  iter = 10000,
  seed = NULL,
  na.rm = F,
  neg.treat = "error"
)

Arguments

net

matrix or data.frame. A gene x gene matrix representing edge weights between genes in a co-expression network. Gene names must be available as row and column names. See details.

pathways

list. List of pathways where each entry contains the genes in each pathway. Pathway names may be provided as names(pathways). If provided, pathway names must be unique.

min.gene

integer. Each pathway must have at least min.gene genes in net. Otherwise enrichment is not computed for the pathway.

max.gene

integer. Each pathway must have at most max.gene genes in net. Otherwise enrichment is not computed for the pathway.

iter

integer. The number of random iterations or the number of random gene sets to compute the null distribution.

seed

integer or NULL. Random number generator seed.

na.rm

logical. Should edges with NA weights be excluded? If FALSE, net cannot have any edge with NA weight.

neg.treat

character representing how negative values in net should be treated. Accepted values are 'none', 'warn' and 'error'. If 'allow', negative values are allowed. If 'warn', a warning is generated. If 'error', an error is generated.

Details

To compute the enrichment p-value of a pathway in a co-expression network, we define the score of a pathway as the the sum of weights in net of all possible edges between the genes in the pathway. The enrichment p-value is then defined as the probability that the score of the pathway is at least as big as a random gene set with the same number of genes.

To get the null distribution, we generate a number of (iter) random gene sets where each gene set consists of the same number of randomly selected genes, compute their scores, and fit a normal distribution.

Enrichment of a pathway is computed only if at least min.gene and at most max.gene genes from the pathway are available in net. This criteria helps to avoid too small or to large pathways.

Each value in net should represent the relative probability that the corresponding edge is true. In other words, larger values should represent higher confidence in corresponding edges. If the sign of values in net represents positive or negative associations between genes, you probably should provide absolute values. If you still want to allow negative values in net, you may set neg.treat = "allow". In this case, any negative value will represent lower confidence than any non-negative value.

net must be a square matrix. Gene names must be available as row and column names. Gene names must be unique. net must be symmetric when rows and columns are identically ordered. Diagonal entries are ignored.

Value

A data.frame with the following columns.

pathway

Pathway name taken from names(pathways). If pathway names are not provided, pathway index is used.

n.gene

Number of genes from the pathway available in net.

p

p-value for the pathway (computed using a fitted normal distribution as null).

p.empirical

Empirical p-value for the pathway.

fdr

False discovery rate computed using Benjamini-Hochberg method.

fdr.empirical

Empirical false discovery rate computed using Benjamini-Hochberg method.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
genes = c('TP53', 'RBM3', 'SF3', 'LIM12', 'ATM', 'TMEM160', 'BCL2L1', 'MDM2',
          'PDR', 'MEG3', 'EGFR', 'CD96', 'KEAP1', 'SRSF1', 'TSEN2')
dummy_net = matrix(rnorm(length(genes)^2), nrow = length(genes), dimnames = list(genes, genes))
dummy_net = abs((dummy_net + t(dummy_net))/2)                    # symmetric network
dummy_pathways = list(pathway1=c('TP53', 'RBM3', 'SF1', 'SF5'),
                       pathway2=c('LIM12', 'MDM2', 'BCL2L1', 'TMEM160', 'ATM'),
                       pathway3=c('EGFR', 'TP53', 'CD96', 'SRSF1', 'RBM14'))
enrich_res = coexpression_pathway_enrichment(net = dummy_net,
                                                pathways = dummy_pathways,
                                                min.gene = 3)
print(enrich_res)
n_sig = sum(enrich_res$fdr <= 0.05)
print(sprintf('Number of significantly enriched pathways: %d', n_sig))

alorchhota/spice documentation built on March 12, 2021, 12:05 a.m.