benchmark: Benchmark networks using Network Enrichment Analysis (NEA)
In NEArender: Network Enrichment Analysis

Description Usage Arguments Details Value References See Also Examples

Tests the ability of a given network to perform well in a network enrichment analysis. It executes a series of multiple individual tests: for each member gene of a pathway or another functional set calculates the network enrichment score against other members of the same gene set. This procedure gives true positive and false negative test results. In order to complement it with false positives and true negatives, the same is done for randomly picked genes (with matching node connectivity values) against the same functional sets. The two vectors allow plotting a ROC curve where at each sequential cutoff represents a ratio of true positive vs. false positive predictions. This approach (first presented in Merid et al. (2012)) is an alternative to the trivial counting edges shared between different networks and is superior to the latter because: 1) the analysis can be done without knowing the "true" reference network, 2) benchmarks can be context-dependent by using domain-specific test sets (e.g. cancer, diabetes etc.), 3) one can compare more than two networks at a time, and 4) given dense global networks and due to the use of multi-gene sets, presence or absence of particular links is unlikely to affect the overall result.

benchmark(NET, GS, gs.gene.col = 2, gs.group.col = 3, net.gene1.col = 1,
  net.gene2.col = 2, echo = 1, graph = FALSE, na.replace = 0,
  mask = ".", minN = 0, coff.z = 1.965, coff.fdr = 0.1,
  Parallelize = 1)

`NET`	A network to benchmark. See Details in `nea.render`.
`GS`	a test set, typically a set of pathways with known members.
`gs.gene.col`	number of the column containing GS genes (only needed if GS is submitted as a text file)
`gs.group.col`	number of the column containing group IDs (only needed if GS is submitted as a text file)
`net.gene1.col`	number of the column containing first nodes of each network edge (only needed if NET is submitted as a text file)
`net.gene2.col`	number of the column containing second nodes of each network edge (only needed if NET is submitted as a text file)
`echo`	if messages about execution progress should appear
`graph`	Plot the ROC curve immediately. Alternatively, the returned list is plotted afterwards by `roc`. In the latter case, it could be a combined list of lists for multiple test sets and networks which are then plotted as separate curves (see Examples).
`na.replace`	replace NA values. Default=0, i.e. do not replace.
`mask`	when the test set contains various GSs, they can be used selectively by applying a mask. The mask follows the regular expression synthax, since `fixed=FALSE` in `grep`.
`minN`	the minimal number of network edges that must connect a tested member with the GS genes for the test to be considered positive. (Default:0).
`coff.z`	a parameter to `roc`.
`coff.fdr`	to make significance levels comparable between different curves, the point where FDR=`coff.fdr` will be labeled with a circle (think of TP/FP ratio at this level).
`Parallelize`	The number of CPU cores to be used for the step "Counting actual links" (while the other steps are sufficiently fast). The option is not supported in Windows.

The function would either plot a ROC curve for the analyzed network, or return an object with the following slots from function prediction (package ROCR):
tp, vector of true positives;
fp, vector of false positives;
tn, vector of true negatives;
fn, vector of false negatives;
cutoffs, z-score cutoffs from nea.render;
cross.z, a z-score value which corresponds to FDR=coff.fdr (will be denoted with a special marker at the curve);

An object, i.e. a list of three equal-length vectors from a prediction object of ROCR package (prediction@cutoffs, prediction@fp, prediction@tp) and the point that matches coff.fdr. These are needed to plot a ROC curve for the given network and test set by using roc.

http://www.biomedcentral.com/1471-2105/15/308

roc, nea.render

data(can.sig.go);
fpath <- can.sig.go
gs.list <- import.gs(fpath, Lowercase = 1, col.gene = 2, col.set = 3);
data(net.kegg)
netpath <- net.kegg
net <- import.net(netpath)

b0 <- benchmark (NET = net,
 GS = gs.list, 
 echo=1, graph=TRUE, na.replace = 0, mask = ".", minN = 0,
 coff.z = 1.965, coff.fdr = 0.1, Parallelize=2);

## Not run: 
## Benchmark a number of networks on GO terms and KEGG pathways separately, using masks:
b1 <- NULL;
for (mask in c("kegg_", "go_")) {
b1[[mask]] <- NULL;
for (file.net in c("netpath")) {
# a series of networks can be put here: c("netpath1", "netpath2", "netpath3")
net <- import.net(netpath, col.1 = 1, col.2 = 2, Lowercase = 1, echo = 1)
b1[[mask]][[file.net]] <- benchmark (NET = net, GS = gs.list, echo=1, 
graph=FALSE, na.replace = 0, mask = mask, minN = 0,  Parallelize=1);
}}
par(mfrow=c(2,1));
roc(b1[["kegg_"]], coff.z = 2.57,main="kegg_");
roc(b1[["go_"]], coff.z = 2.57,main="go_");

## End(Not run)