example-data: Example data

Description Usage Format Details Simulation details References See Also


Example gene coexpression networks inferred from two independent datasets to demonstrate the usage of package functions.





The preservation of network modules in a second dataset is quantified by measuring the preservation of topological properties between the discovery and test datasets. These properties are calculated not only from the interaction networks inferred in each dataset, but also from the data used to infer those networks (e.g. gene expression data) as well as the correlation structure between variables/nodes. Thus, all functions in the NetRep package have the following arguments:

This data is used to provide concrete examples of the usage of these arguments in each package function.

Simulation details

The discovery gene expression dataset ("discovery_data") containing 30 samples and 150 genes was simulated to contain four distinct modules of sizes 20, 25, 30, and 35 genes. Data for each module were simulated as:

G^{(w)}_{simulated} = E^{(w)} r_i + √{1 - r^2_i} ε

Where E^{(w)} is the simulated module's summary vector, r is the simulated module's node contributions for each gene, and ε is the error term drawn from a standard normal distribution. E^{(w)} and r were simulated by bootstrapping (sampling with replacement) samples and genes from the corresponding vectors in modules 63, 51, 57, and 50 discovered in the liver tissue gene expression data from a publicly available mouse dataset (see reference (1) for details on the dataset and network discovery). The remaining 40 genes that were not part of any module were simulated by randomly selecting 40 liver genes and bootstrapping 30 samples and adding the noise term, ε. A vector of module assignments was created ("module_labels") in which each gene was labelled with a number 1-4 corresponding to the module they were simulated to be coexpressed with, or a label of 0 for the for the 40 "background" genes not participating in any module. The correlation structure ("discovery_correlation") was calculated as the Pearson's correlation coefficient between genes (cor(discovery_data)). Edge weights in the interaction network ("discovery_network") were calculated as the absolute value of the correlation coefficient exponentiated to the power 5 (abs(discovery_correlation)^5).

An independent test dataset ("test_data") containing the same 150 genes as the discovery dataset but 30 different samples was simulated as above. Modules 1 and 4 (containing 20 and 35 genes respectively) were simulated to be preserved using the same equation above, where the summary vector E^{(w)} was bootstrapped from the same liver modules (modules 63 and 50) as in the discovery and with identical node contributions r as in the discovery dataset. Genes in modules 2 and 3 were simulated as "background" genes, i.e. not preserved as described above. The correlation structure between genes in the test dataset ("test_correlation") and the interaction network ("test_network") were calculated the same way as in the discovery dataset.

The random seed used for the simulations was 37.


  1. Ritchie, S.C., et al., A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell Systems. 3, 71-82 (2016).

See Also

modulePreservation, plotModule, and networkProperties.

NetRep documentation built on June 12, 2018, 5:04 p.m.