coexpression_pathway_enrichment: Enrichment of pathways (gene sets) in a co-expression...
In alorchhota/spice: SPICE: Spanning tree based inference of co-expression networks

Description Usage Arguments Details Value Examples

View source: R/coexpression_pathway_enrichment.R

This function computes enrichment p-values of pathways in a co-expression network. See details.

coexpression_pathway_enrichment(
  net,
  pathways,
  min.gene = 5,
  max.gene = 100,
  iter = 10000,
  seed = NULL,
  na.rm = F,
  neg.treat = "error"
)

`net`	matrix or data.frame. A gene x gene matrix representing edge weights between genes in a co-expression network. Gene names must be available as row and column names. See details.
`pathways`	list. List of pathways where each entry contains the genes in each pathway. Pathway names may be provided as `names(pathways)`. If provided, pathway names must be unique.
`min.gene`	integer. Each pathway must have at least `min.gene` genes in `net`. Otherwise enrichment is not computed for the pathway.
`max.gene`	integer. Each pathway must have at most `max.gene` genes in `net`. Otherwise enrichment is not computed for the pathway.
`iter`	integer. The number of random iterations or the number of random gene sets to compute the null distribution.
`seed`	integer or NULL. Random number generator seed.
`na.rm`	logical. Should edges with `NA` weights be excluded? If FALSE, `net` cannot have any edge with `NA` weight.
`neg.treat`	character representing how negative values in `net` should be treated. Accepted values are `'none'`, `'warn'` and `'error'`. If `'allow'`, negative values are allowed. If `'warn'`, a warning is generated. If `'error'`, an error is generated.

To compute the enrichment p-value of a pathway in a co-expression network, we define the score of a pathway as the the sum of weights in net of all possible edges between the genes in the pathway. The enrichment p-value is then defined as the probability that the score of the pathway is at least as big as a random gene set with the same number of genes.

To get the null distribution, we generate a number of (iter) random gene sets where each gene set consists of the same number of randomly selected genes, compute their scores, and fit a normal distribution.

Enrichment of a pathway is computed only if at least min.gene and at most max.gene genes from the pathway are available in net. This criteria helps to avoid too small or to large pathways.

Each value in net should represent the relative probability that the corresponding edge is true. In other words, larger values should represent higher confidence in corresponding edges. If the sign of values in net represents positive or negative associations between genes, you probably should provide absolute values. If you still want to allow negative values in net, you may set neg.treat = "allow". In this case, any negative value will represent lower confidence than any non-negative value.

net must be a square matrix. Gene names must be available as row and column names. Gene names must be unique. net must be symmetric when rows and columns are identically ordered. Diagonal entries are ignored.

A data.frame with the following columns.

`pathway`	Pathway name taken from `names(pathways)`. If pathway names are not provided, pathway index is used.
`n.gene`	Number of genes from the pathway available in `net`.
`p`	p-value for the pathway (computed using a fitted normal distribution as null).
`p.empirical`	Empirical p-value for the pathway.
`fdr`	False discovery rate computed using Benjamini-Hochberg method.
`fdr.empirical`	Empirical false discovery rate computed using Benjamini-Hochberg method.

genes = c('TP53', 'RBM3', 'SF3', 'LIM12', 'ATM', 'TMEM160', 'BCL2L1', 'MDM2',
          'PDR', 'MEG3', 'EGFR', 'CD96', 'KEAP1', 'SRSF1', 'TSEN2')
dummy_net = matrix(rnorm(length(genes)^2), nrow = length(genes), dimnames = list(genes, genes))
dummy_net = abs((dummy_net + t(dummy_net))/2)                    # symmetric network
dummy_pathways = list(pathway1=c('TP53', 'RBM3', 'SF1', 'SF5'),
                       pathway2=c('LIM12', 'MDM2', 'BCL2L1', 'TMEM160', 'ATM'),
                       pathway3=c('EGFR', 'TP53', 'CD96', 'SRSF1', 'RBM14'))
enrich_res = coexpression_pathway_enrichment(net = dummy_net,
                                                pathways = dummy_pathways,
                                                min.gene = 3)
print(enrich_res)
n_sig = sum(enrich_res$fdr <= 0.05)
print(sprintf('Number of significantly enriched pathways: %d', n_sig))