Benchmark networks using Network Enrichment Analysis (NEA)

Share:

Description

Tests the ability of a given network to perform well in a network enrichment analysis. It executes a series of multiple individual tests: for each member gene of a pathway or another functional set calculates the network enrichment score against other members of the same gene set. This procedure gives true positive and false negative test results. In order to complement it with false positives and true negatives, the same is done for randomly picked genes (with matching node connectivity values) against the same functional sets. The two vectors allow plotting a ROC curve where at each sequential cutoff represents a ratio of true positive vs. false positive predictions. This approach (first presented in Merid et al. (2012) http://www.biomedcentral.com/1471-2105/15/308) is an alternative to the trivial counting edges shared between different networks and is superior to the latter because: 1) the analysis can be done without knowing the "true", reference network, 2) benchmarks can be context-dependent by using domain-specific test sets (e.g. cancer, diabetes etc.), 3) one can compare more than two networks at a time, and 4) given dense global networks and due to the use of multi-gene sets, presence or absence of particular links is unlikely to affect the overall result.

Usage

1
2
3
4
benchmark(NET, GS, gs.gene.col = 2, gs.group.col = 3, net.gene1.col = 1,
  net.gene2.col = 2, echo = 1, graph = FALSE, na.replace = 0,
  mask = ".", minN = 0, coff.z = 1.965, coff.fdr = 0.1,
  Parallelize = 1)

Arguments

NET

A network to benchmark. See Details in nea.render.

GS

a test set, typically a set of pathways with known members.

gs.gene.col

number of the column containing GS genes (only needed if GS is submitted as a text file)

gs.group.col

number of the column containing group IDs (only needed if GS is submitted as a text file)

net.gene1.col

number of the column containing first nodes of each network edge (only needed if NET is submitted as a text file)

net.gene2.col

number of the column containing second nodes of each network edge (only needed if NET is submitted as a text file)

echo

if messages about execution progress should appear

graph

Plot the ROC curve immediately. Alternatively, the returned list is plotted afterwards by roc. In the latter case, it could be a combined list of lists for multiple test sets and networks which are then plotted as separate curves (see Examples).

na.replace

replace NA values. Default=0, i.e. do not replace.

mask

when the test set contains various GSs, they can be used selectively by applying a mask. The mask follows the regular expression synthax, since fixed=FALSE in grep.

minN

the minimal number of network edges that must connect a tested member with the GS genes for the test to be considered positive. (Default:0).

coff.z

a parameter to roc.

coff.fdr

a parameter to roc.

Parallelize

The number of CPU cores to be used for the step "Counting actual links" (while the other steps are sufficiently fast). The option is not supported in Windows.

Value

An object, i.e. a list of three equal-length vectors from a prediction object of ROCR package (prediction@cutoffs, prediction@fp, prediction@tp) and the point that matches coff.fdr. These are needed to plot a ROC curve for the given network and test set by using roc.

References

http://www.biomedcentral.com/1471-2105/15/308

See Also

roc, nea.render

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
data(can.sig.go);
fpath <- can.sig.go
gs.list <- import.gs(fpath, Lowercase = 1, col.gene = 2, col.set = 3);
data(net.kegg)
netpath <- net.kegg
net <- import.net(netpath)

b0 <- benchmark (NET = net,
 GS = gs.list[c("kegg_04270_vascular_smooth_muscle_contraction")], 
 echo=1, graph=TRUE, na.replace = 0, mask = ".", minN = 0,
 coff.z = 1.965, coff.fdr = 0.1, Parallelize=2);

## Not run: 
## Benchmark a number of networks on GO terms and KEGG pathways separately, using masks:
b1 <- NULL;
for (mask in c("kegg_", "go_")) {
b1[[mask]] <- NULL;
ref_list <- list(net1=netpath,net2=netpath)
for (file.net in c("net1")) {
# a series of networks can be put here: c("netpath1", "netpath2", "netpath3")
net <- import.net(ref_list[[file.net]], col.1 = 1, col.2 = 2, Lowercase = 1, echo = 1)
b1[[mask]][[file.net]] <- benchmark (NET = net, GS = gs.list, echo=1, 
graph=FALSE, na.replace = 0, mask = mask, minN = 0,  Parallelize=1);
}}
par(mfrow=c(2,1));
roc(b1[["kegg_"]][[file.net]], coff.z = 2.57, coff.fdr = 0.01);
roc(b1[["go_"]][[file.net]], coff.z = 2.57, coff.fdr = 0.01);

## End(Not run)