netCompare: Group Comparison of Network Properties
In stefpeschel/NetCoMi: Network Construction and Comparison for Microbiome Data

netCompare

R Documentation

Group Comparison of Network Properties

Description

Calculate and compare network properties for microbial networks using Jaccard's index, the Rand index, the Graphlet Correlation Distance, and permutation tests.

Usage

netCompare(
  x,
  permTest = FALSE,
  jaccQuant = 0.75,
  lnormFit = NULL,
  testRand = TRUE,
  nPermRand = 1000L,
  gcd = TRUE,
  gcdOrb = c(0, 2, 5, 7, 8, 10, 11, 6, 9, 4, 1),
  verbose = TRUE,
  nPerm = 1000L,
  adjust = "adaptBH",
  trueNullMethod = "convest",
  cores = 1L,
  logFile = NULL,
  seed = NULL,
  fileLoadAssoPerm = NULL,
  fileLoadCountsPerm = NULL,
  storeAssoPerm = FALSE,
  fileStoreAssoPerm = "assoPerm",
  storeCountsPerm = FALSE,
  fileStoreCountsPerm = c("countsPerm1", "countsPerm2"),
  returnPermProps = FALSE,
  returnPermCentr = FALSE,
  assoPerm = NULL,
  dissPerm = NULL
)

Arguments

`x`	object of class `microNetProps` (returned by `netAnalyze`).
`permTest`	logical. If `TRUE`, a permutation test is conducted to test centrality measures and global network properties for group differences. Defaults to `FALSE`. May lead to a considerably increased execution time!
`jaccQuant`	numeric value between 0 and 1 specifying the quantile used as threshold to identify the most central nodes for each centrality measure. The resulting sets of nodes are used to calculate Jaccard's index (see details). Default is 0.75.
`lnormFit`	logical indicating whether a log-normal distribution should be fitted to the calculated centrality values for determining Jaccard's index (see details). If `NULL` (default), the value is adopted from the input, i.e., equals the method used for determining hub nodes.
`testRand`	logical. If `TRUE`, a permutation test is conducted for the adjusted Rand index (with H0: ARI = 0). Execution time may be increased for large networks.
`nPermRand`	integer giving the number of permutations used for testing the adjusted Rand index for being significantly different from zero. Ignored if `testRand = FALSE`. Defaults to 1000L.
`gcd`	logical. If `TRUE` (default), the Graphlet Correlation Distance (GCD) is computed.
`gcdOrb`	numeric vector with integers from 0 to 14 defining the orbits used for calculating the GCD. Minimum length is 2. Defaults to c(0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11), thus excluding redundant orbits such as the orbit o3.
`verbose`	logical. If `TRUE` (default), status messages are shown.
`nPerm`	integer giving the number of permutations if `permTest = TRUE`. Default is 1000L.
`adjust`	character indicating the method used for multiple testing adjustment of the permutation p-values. Possible values are `"lfdr"` (default) for local false discovery rate correction (via `fdrtool`), `"adaptBH"` for the adaptive Benjamini-Hochberg method (Benjamini and Hochberg, 2000), or one of the methods provided by `p.adjust` (see `p.adjust.methods()`).
`trueNullMethod`	character indicating the method used for estimating the proportion of true null hypotheses from a vector of p-values. Used for the adaptive Benjamini-Hochberg method for multiple testing adjustment (chosen by `adjust = "adaptBH"`). Accepts the provided options of the `method` argument of `propTrueNull`: `"convest"`(default), `"lfdr"`, `"mean"`, and `"hist"`. Can alternatively be `"farco"` for the "iterative plug-in method" proposed by Farcomeni (2007).
`cores`	integer indicating the number of CPU cores used for permutation tests. If cores > 1, the tests are performed in parallel. Is limited to the number of available CPU cores determined by `detectCores`. Defaults to 1L (no parallelization).
`logFile`	character string naming the log file to which the current iteration number is written (if permutation tests are performed). Defaults to `NULL` so that no log file is generated.
`seed`	integer giving a seed for reproducibility of the results.
`fileLoadAssoPerm`	character giving the name or path (without file extension) of the file containing the "permuted" association/dissimilarity matrices that was generated by setting `storeAssoPerm` to `TRUE`. Only used for permutation tests. If `NULL`, no existing associations are used.
`fileLoadCountsPerm`	character giving the name or path (without file extension) of the file containing the "permuted" count matrices that was generated by setting `storeCountsPerm` to `TRUE`. Only used for permutation tests, and if `fileLoadAssoPerm = NULL`. If `NULL`, no existing count matrices are used.
`storeAssoPerm`	logical indicating whether the association/dissimilarity matrices for the permuted data should be saved to a file. The file name is given via `fileStoreAssoPerm`. If `TRUE`, the computed "permutation" association/dissimilarity matrices can be reused via `fileLoadAssoPerm` to save runtime. Defaults to `FALSE`. Ignored if `fileLoadAssoPerm` is not `NULL`.
`fileStoreAssoPerm`	character giving the name of a file to which the matrix with associations/dissimilarities of the permuted data is saved. Can also be a path.
`storeCountsPerm`	logical indicating whether the permuted count matrices should be saved to an external file. Defaults to `FALSE`. Ignored if `fileLoadCountsPerm` is not `NULL`.
`fileStoreCountsPerm`	character vector with two elements giving the names of two files storing the permuted count matrices belonging to the two groups.
`returnPermProps`	logical. If `TRUE`, the global properties and their absolute differences for the permuted data are returned.
`returnPermCentr`	logical. If `TRUE`, the centralities and their absolute differences for the permuted data are returned.
`assoPerm`	only needed for output generated with NetCoMi v1.0.1! A list with two elements used for the permutation procedure. Each entry must contain association matrices for `"nPerm"` permutations. This can be the `"assoPerm"` value as part of the output either returned by `diffnet` or `netCompare`.
`dissPerm`	only needed for output generated with NetCoMi v1.0.1! Usage analog to `assoPerm` if a dissimilarity measure has been used for network construction.

Details

Permutation procedure:
Used for testing centrality measures and global network properties for group differences.
The null hypothesis of the tests is defined as

H_0: c1_i - c2_i = 0,

where c1_i and c2_i denote the centrality values of taxon i in group 1 and 2, respectively.
To generate a sampling distribution of the differences under H_0, the group labels are randomly reassigned to the samples while the group sizes are kept. The associations are then re-estimated for each permuted data set. The p-values are calculated as the proportion of "permutation-differences" being larger than or equal to the observed difference. In non-exact tests, a pseudo-count is added to the numerator and denominator to avoid p-values of zero. Several methods for adjusting the p-values for multiplicity are available.

Jaccard's index:
Jaccard's index expresses for each centrality measure how equal the sets of most central nodes are among the two networks.
These sets are defined as nodes with a centrality value above a defined quantile (via jaccQuant) either of the empirical distribution of the centrality values (lnormFit = FALSE) or of a fitted log-normal distribution (lnormFit = TRUE).
The index ranges from 0 to 1, where 1 means the sets of most central nodes are exactly equal in both networks and 0 indicates that the most central nodes are completely different.
The index is calculated as suggested by Real and Vargas (1996).

Rand index:
The Rand index is used to express whether the determined clusterings are equal in both groups. The adjusted Rand index (ARI) ranges from -1 to 1, where 1 indicates that the two clusterings are exactly equal. The expected index value for two random clusterings is 0. The implemented test procedure is in accordance with the explanations in Qannari et al. (2014), where a p-value below the alpha levels means that ARI is significantly higher than expected for two random clusterings.

Graphlet Correlation Distance:
A graphlet-based distance measure, which is defined as the Euclidean distance of the upper triangle values of the Graphlet Correlation Matrices (GCM) of two networks (Yaveroglu et al., 2014). The GCM of a network is a matrix with Spearman's correlations between the network's node orbits (Hocevar and Demsar, 2016). See calcGCD for details.

Value

Object of class microNetComp with the following elements:

`jaccDeg,jaccBetw,jaccClose,jaccEigen`	Values of Jaccard's index for the centrality measures
`jaccHub`	Jaccard index for the sets of hub nodes
`randInd`	Adjusted Rand index
`randIndLCC`	Adjusted Rand index for the largest connected component (LCC)
`gcd`	Graphlet Correlation Distance (object of class `gcd` returned by `calcGCD`)
`gcdLCC`	Graphlet Correlation Distance for the LCC
`properties`	List with calculated network properties
`propertiesLCC`	List with calculated network properties of the LCC
`diffGlobal`	Vectors with differences of global properties
`diffGlobalLCC`	Vectors with differences of global properties for the LCC
`diffCent`	Vectors with differences of the centrality values
`countMatrices`	The two count matrices returned by `netConstruct`
`assoMatrices`	The two association matrices returned by `netConstruct`
`dissMatrices`	The two dissimilarity matrices returned by `netConstruct`
`adjaMatrices`	The two adjacency matrices returned by `netConstruct`
`groups`	Group names returned by `netConstruct`
`paramsProperties`	Parameters used for network analysis

Additional output if permutation tests are conducted:

`pvalDiffGlobal`	P-values of the tests for differential global properties
`pvalDiffGlobalLCC`	P-values of the tests for differential global properties in the LCC
`pvalDiffCentr`	P-values of the tests for differential centrality values
`pvalDiffCentrAdjust`	Adjusted p-values of the tests for differential centrality values
`permDiffGlobal`	`nPerm` x 10 matrix containing the absolute differences of the ten global network properties (computed for the whole network) for all `nPerm` permutations
`permDiffGlobalLCC`	`nPerm` x 11 matrix containing the absolute differences of the eleven global network properties (computed for the LCC) for all `nPerm` permutations
`permDiffCentr`	List with absolute differences of the four centrality measures for all `nPerm` permutations. Each list contains a `nPerm` x `nNodes` matrix.

References

Benjamini Y, Hochberg Y (2000). “On the adaptive control of the false discovery rate in multiple testing with independent statistics.” Journal of Educational and Behavioral Statistics, 25(1), 60–83.

Farcomeni A (2007). “Some results on the control of the false discovery rate under dependence.” Scandinavian Journal of Statistics, 34(2), 275–297.

Gill R, Datta S, Datta S (2010). “A statistical framework for differential network analysis from microarray data.” BMC Bioinformatics, 11, 95.

Hocevar T, Demsar J (2016). “Computation of graphlet orbits for nodes and edges in sparse graphs.” Journal of Statistical Software, 71, 1–24.

Qannari EM, Courcoux P, Faye P (2014). “Significance test of the adjusted Rand index. Application to the free sorting task.” Food Quality and Preference, 32, 93–97.

Real R, Vargas JM (1996). “The Probabilistic Basis of Jaccard's Index of Similarity.” Systematic Biology, 45, 380–385.

Yaveroglu ON, Malod-Dognin N, Davis D, Levnajic Z, Janjic V, Karapandza R, Stojmirovic A, Przulj N (2014). “Revealing the hidden language of complex networks.” Scientific reports, 4(1), 1–9.

Examples

knitr::opts_chunk$set(fig.width = 16, fig.height = 8)

# Load data sets from American Gut Project (from SpiecEasi package)
data("amgut2.filt.phy")

# Split data into two groups: with and without seasonal allergies
amgut_season_yes <- phyloseq::subset_samples(amgut2.filt.phy,
                                      SEASONAL_ALLERGIES == "yes")
amgut_season_no <- phyloseq::subset_samples(amgut2.filt.phy,
                                     SEASONAL_ALLERGIES == "no")

amgut_season_yes
amgut_season_no

# Filter the 121 samples (sample size of the smaller group) with highest
# frequency to make the sample sizes equal and thus ensure comparability.
n_yes <- phyloseq::nsamples(amgut_season_yes)

# Network construction
amgut_net <- netConstruct(data = amgut_season_yes,
                          data2 = amgut_season_no,
                          measure = "pearson",
                          filtSamp = "highestFreq",
                          filtSampPar = list(highestFreq = n_yes),
                          filtTax = "highestVar",
                          filtTaxPar = list(highestVar = 30),
                          zeroMethod = "pseudoZO", normMethod = "clr")

# Network analysis
# Note: Please zoom into the GCM plot or open a new window using:
# x11(width = 10, height = 10)
amgut_props <- netAnalyze(amgut_net, clustMethod = "cluster_fast_greedy")

# Network plot
plot(amgut_props,
     sameLayout = TRUE,
     title1 = "Seasonal allergies",
     title2 = "No seasonal allergies")

#--------------------------
# Network comparison

# Without permutation tests
amgut_comp1 <- netCompare(amgut_props, permTest = FALSE)
summary(amgut_comp1)


  # With permutation tests (with only 100 permutations to decrease runtime)
  amgut_comp2 <- netCompare(amgut_props,
                            permTest = TRUE,
                            nPerm = 100L,
                            cores = 1L,
                            storeCountsPerm = TRUE,
                            fileStoreCountsPerm = c("countsPerm1",
                                                    "countsPerm2"),
                            storeAssoPerm = TRUE,
                            fileStoreAssoPerm = "assoPerm",
                            seed = 123456)

# Rerun with a different adjustment method ...
# ... using the stored permutation count matrices
amgut_comp3 <- netCompare(amgut_props, adjust = "BH",
                          permTest = TRUE, nPerm = 100L,
                          fileLoadCountsPerm = c("countsPerm1",
                                                 "countsPerm2"),
                          seed = 123456)

# ... using the stored permutation association matrices
amgut_comp4 <- netCompare(amgut_props, adjust = "BH",
                          permTest = TRUE, nPerm = 100L,
                          fileLoadAssoPerm = "assoPerm",
                          seed = 123456)

# amgut_comp3 and amgut_comp4 should be equal
all.equal(amgut_comp3$adjaMatrices, amgut_comp4$adjaMatrices)
all.equal(amgut_comp3$properties, amgut_comp4$properties)

summary(amgut_comp2)
summary(amgut_comp3)
summary(amgut_comp4)

#--------------------------
# Use 'createAssoPerm' to create "permuted" count and association matrices
createAssoPerm(amgut_props, nPerm = 100,
               computeAsso = TRUE,
               fileStoreAssoPerm = "assoPerm",
               storeCountsPerm = TRUE,
               fileStoreCountsPerm = c("countsPerm1", "countsPerm2"),
               append = FALSE, seed = 123456)

amgut_comp5 <- netCompare(amgut_props, permTest = TRUE, nPerm = 100L,
                          fileLoadAssoPerm = "assoPerm")

all.equal(amgut_comp3$properties, amgut_comp5$properties)

summary(amgut_comp5)

stefpeschel/NetCoMi documentation built on June 14, 2025, 1:15 p.m.