netCompare: Group Comparison of Network Properties

View source: R/netCompare.R

netCompareR Documentation

Group Comparison of Network Properties

Description

Calculate and compare network properties for microbial networks using Jaccard's index, the Rand index, the Graphlet Correlation Distance, and permutation tests.

Usage

netCompare(
  x,
  permTest = FALSE,
  jaccQuant = 0.75,
  lnormFit = NULL,
  testRand = TRUE,
  nPermRand = 1000L,
  gcd = TRUE,
  gcdOrb = c(0, 2, 5, 7, 8, 10, 11, 6, 9, 4, 1),
  verbose = TRUE,
  nPerm = 1000L,
  adjust = "adaptBH",
  trueNullMethod = "convest",
  cores = 1L,
  logFile = NULL,
  seed = NULL,
  fileLoadAssoPerm = NULL,
  fileLoadCountsPerm = NULL,
  storeAssoPerm = FALSE,
  fileStoreAssoPerm = "assoPerm",
  storeCountsPerm = FALSE,
  fileStoreCountsPerm = c("countsPerm1", "countsPerm2"),
  returnPermProps = FALSE,
  returnPermCentr = FALSE,
  assoPerm = NULL,
  dissPerm = NULL
)

Arguments

x

object of class microNetProps (returned by netAnalyze).

permTest

logical. If TRUE, a permutation test is conducted to test centrality measures and global network properties for group differences. Defaults to FALSE. May lead to a considerably increased execution time!

jaccQuant

numeric value between 0 and 1 specifying the quantile used as threshold to identify the most central nodes for each centrality measure. The resulting sets of nodes are used to calculate Jaccard's index (see details). Default is 0.75.

lnormFit

logical indicating whether a log-normal distribution should be fitted to the calculated centrality values for determining Jaccard's index (see details). If NULL (default), the value is adopted from the input, i.e., equals the method used for determining hub nodes.

testRand

logical. If TRUE, a permutation test is conducted for the adjusted Rand index (with H0: ARI = 0). Execution time may be increased for large networks.

nPermRand

integer giving the number of permutations used for testing the adjusted Rand index for being significantly different from zero. Ignored if testRand = FALSE. Defaults to 1000L.

gcd

logical. If TRUE (default), the Graphlet Correlation Distance (GCD) is computed.

gcdOrb

numeric vector with integers from 0 to 14 defining the orbits used for calculating the GCD. Minimum length is 2. Defaults to c(0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11), thus excluding redundant orbits such as the orbit o3.

verbose

logical. If TRUE (default), status messages are shown.

nPerm

integer giving the number of permutations if permTest = TRUE. Default is 1000L.

adjust

character indicating the method used for multiple testing adjustment of the permutation p-values. Possible values are "lfdr" (default) for local false discovery rate correction (via fdrtool), "adaptBH" for the adaptive Benjamini-Hochberg method (Benjamini and Hochberg, 2000), or one of the methods provided by p.adjust (see p.adjust.methods()).

trueNullMethod

character indicating the method used for estimating the proportion of true null hypotheses from a vector of p-values. Used for the adaptive Benjamini-Hochberg method for multiple testing adjustment (chosen by adjust = "adaptBH"). Accepts the provided options of the method argument of propTrueNull: "convest"(default), "lfdr", "mean", and "hist". Can alternatively be "farco" for the "iterative plug-in method" proposed by Farcomeni (2007).

cores

integer indicating the number of CPU cores used for permutation tests. If cores > 1, the tests are performed in parallel. Is limited to the number of available CPU cores determined by detectCores. Defaults to 1L (no parallelization).

logFile

character string naming the log file to which the current iteration number is written (if permutation tests are performed). Defaults to NULL so that no log file is generated.

seed

integer giving a seed for reproducibility of the results.

fileLoadAssoPerm

character giving the name or path (without file extension) of the file containing the "permuted" association/dissimilarity matrices that was generated by setting storeAssoPerm to TRUE. Only used for permutation tests. If NULL, no existing associations are used.

fileLoadCountsPerm

character giving the name or path (without file extension) of the file containing the "permuted" count matrices that was generated by setting storeCountsPerm to TRUE. Only used for permutation tests, and if fileLoadAssoPerm = NULL. If NULL, no existing count matrices are used.

storeAssoPerm

logical indicating whether the association/dissimilarity matrices for the permuted data should be saved to a file. The file name is given via fileStoreAssoPerm. If TRUE, the computed "permutation" association/dissimilarity matrices can be reused via fileLoadAssoPerm to save runtime. Defaults to FALSE. Ignored if fileLoadAssoPerm is not NULL.

fileStoreAssoPerm

character giving the name of a file to which the matrix with associations/dissimilarities of the permuted data is saved. Can also be a path.

storeCountsPerm

logical indicating whether the permuted count matrices should be saved to an external file. Defaults to FALSE. Ignored if fileLoadCountsPerm is not NULL.

fileStoreCountsPerm

character vector with two elements giving the names of two files storing the permuted count matrices belonging to the two groups.

returnPermProps

logical. If TRUE, the global properties and their absolute differences for the permuted data are returned.

returnPermCentr

logical. If TRUE, the centralities and their absolute differences for the permuted data are returned.

assoPerm

only needed for output generated with NetCoMi v1.0.1! A list with two elements used for the permutation procedure. Each entry must contain association matrices for "nPerm" permutations. This can be the "assoPerm" value as part of the output either returned by diffnet or netCompare.

dissPerm

only needed for output generated with NetCoMi v1.0.1! Usage analog to assoPerm if a dissimilarity measure has been used for network construction.

Details

Permutation procedure:
Used for testing centrality measures and global network properties for group differences.
The null hypothesis of the tests is defined as

H_0: c1_i - c2_i = 0,

where c1_i and c2_i denote the centrality values of taxon i in group 1 and 2, respectively.
To generate a sampling distribution of the differences under H_0, the group labels are randomly reassigned to the samples while the group sizes are kept. The associations are then re-estimated for each permuted data set. The p-values are calculated as the proportion of "permutation-differences" being larger than or equal to the observed difference. In non-exact tests, a pseudo-count is added to the numerator and denominator to avoid p-values of zero. Several methods for adjusting the p-values for multiplicity are available.

Jaccard's index:
Jaccard's index expresses for each centrality measure how equal the sets of most central nodes are among the two networks.
These sets are defined as nodes with a centrality value above a defined quantile (via jaccQuant) either of the empirical distribution of the centrality values (lnormFit = FALSE) or of a fitted log-normal distribution (lnormFit = TRUE).
The index ranges from 0 to 1, where 1 means the sets of most central nodes are exactly equal in both networks and 0 indicates that the most central nodes are completely different.
The index is calculated as suggested by Real and Vargas (1996).

Rand index:
The Rand index is used to express whether the determined clusterings are equal in both groups. The adjusted Rand index (ARI) ranges from -1 to 1, where 1 indicates that the two clusterings are exactly equal. The expected index value for two random clusterings is 0. The implemented test procedure is in accordance with the explanations in Qannari et al. (2014), where a p-value below the alpha levels means that ARI is significantly higher than expected for two random clusterings.

Graphlet Correlation Distance:
A graphlet-based distance measure, which is defined as the Euclidean distance of the upper triangle values of the Graphlet Correlation Matrices (GCM) of two networks (Yaveroglu et al., 2014). The GCM of a network is a matrix with Spearman's correlations between the network's node orbits (Hocevar and Demsar, 2016). See calcGCD for details.

Value

Object of class microNetComp with the following elements:

jaccDeg,jaccBetw,jaccClose,jaccEigen Values of Jaccard's index for the centrality measures
jaccHub Jaccard index for the sets of hub nodes
randInd Adjusted Rand index
randIndLCC Adjusted Rand index for the largest connected component (LCC)
gcd Graphlet Correlation Distance (object of class gcd returned by calcGCD)
gcdLCC Graphlet Correlation Distance for the LCC
properties List with calculated network properties
propertiesLCC List with calculated network properties of the LCC
diffGlobal Vectors with differences of global properties
diffGlobalLCC Vectors with differences of global properties for the LCC
diffCent Vectors with differences of the centrality values
countMatrices The two count matrices returned by netConstruct
assoMatrices The two association matrices returned by netConstruct
dissMatrices The two dissimilarity matrices returned by netConstruct
adjaMatrices The two adjacency matrices returned by netConstruct
groups Group names returned by netConstruct
paramsProperties Parameters used for network analysis

Additional output if permutation tests are conducted:

pvalDiffGlobal P-values of the tests for differential global properties
pvalDiffGlobalLCC P-values of the tests for differential global properties in the LCC
pvalDiffCentr P-values of the tests for differential centrality values
pvalDiffCentrAdjust Adjusted p-values of the tests for differential centrality values
permDiffGlobal nPerm x 10 matrix containing the absolute differences of the ten global network properties (computed for the whole network) for all nPerm permutations
permDiffGlobalLCC nPerm x 11 matrix containing the absolute differences of the eleven global network properties (computed for the LCC) for all nPerm permutations
permDiffCentr List with absolute differences of the four centrality measures for all nPerm permutations. Each list contains a nPerm x nNodes matrix.

References

\insertRef

benjamini2000adaptiveNetCoMi

\insertReffarcomeni2007someNetCoMi

\insertRefgill2010statisticalNetCoMi

\insertRefhocevar2016computationNetCoMi

\insertRefqannari2014significanceNetCoMi

\insertRefreal1996probabilisticNetCoMi

\insertRefyaveroglu2014revealingNetCoMi

See Also

summary.microNetComp, netConstruct, netAnalyze

Examples


# Load data sets from American Gut Project (from SpiecEasi package)
data("amgut2.filt.phy")

# Split data into two groups: with and without seasonal allergies
amgut_season_yes <- phyloseq::subset_samples(amgut2.filt.phy, 
                                      SEASONAL_ALLERGIES == "yes")
amgut_season_no <- phyloseq::subset_samples(amgut2.filt.phy, 
                                     SEASONAL_ALLERGIES == "no")

amgut_season_yes
amgut_season_no

# Filter the 121 samples (sample size of the smaller group) with highest 
# frequency to make the sample sizes equal and thus ensure comparability.
n_yes <- phyloseq::nsamples(amgut_season_yes)

# Network construction
amgut_net <- netConstruct(data = amgut_season_yes,
                          data2 = amgut_season_no,
                          measure = "pearson",
                          filtSamp = "highestFreq",
                          filtSampPar = list(highestFreq = n_yes),
                          filtTax = "highestVar",
                          filtTaxPar = list(highestVar = 30),
                          zeroMethod = "pseudoZO", normMethod = "clr")

# Network analysis
# Note: Please zoom into the GCM plot or open a new window using:
# x11(width = 10, height = 10)
amgut_props <- netAnalyze(amgut_net, clustMethod = "cluster_fast_greedy")

# Network plot
plot(amgut_props,
     sameLayout = TRUE,
     title1 = "Seasonal allergies",
     title2 = "No seasonal allergies")

#--------------------------
# Network comparison

# Without permutation tests
amgut_comp1 <- netCompare(amgut_props, permTest = FALSE)
summary(amgut_comp1)


  # With permutation tests (with only 100 permutations to decrease runtime)
  amgut_comp2 <- netCompare(amgut_props,
                            permTest = TRUE,
                            nPerm = 100L,
                            cores = 1L,
                            storeCountsPerm = TRUE,
                            fileStoreCountsPerm = c("countsPerm1",
                                                    "countsPerm2"),
                            storeAssoPerm = TRUE,
                            fileStoreAssoPerm = "assoPerm",
                            seed = 123456)

# Rerun with a different adjustment method ...
# ... using the stored permutation count matrices
amgut_comp3 <- netCompare(amgut_props, adjust = "BH",
                          permTest = TRUE, nPerm = 100L,
                          fileLoadCountsPerm = c("countsPerm1",
                                                 "countsPerm2"),
                          seed = 123456)

# ... using the stored permutation association matrices
amgut_comp4 <- netCompare(amgut_props, adjust = "BH",
                          permTest = TRUE, nPerm = 100L, 
                          fileLoadAssoPerm = "assoPerm",
                          seed = 123456)
  
# amgut_comp3 and amgut_comp4 should be equal
all.equal(amgut_comp3$adjaMatrices, amgut_comp4$adjaMatrices)
all.equal(amgut_comp3$properties, amgut_comp4$properties)

summary(amgut_comp2)
summary(amgut_comp3)
summary(amgut_comp4)

#--------------------------
# Use 'createAssoPerm' to create "permuted" count and association matrices
createAssoPerm(amgut_props, nPerm = 100, 
               computeAsso = TRUE,
               fileStoreAssoPerm = "assoPerm",
               storeCountsPerm = TRUE, 
               fileStoreCountsPerm = c("countsPerm1", "countsPerm2"),
               append = FALSE, seed = 123456)

amgut_comp5 <- netCompare(amgut_props, permTest = TRUE, nPerm = 100L, 
                          fileLoadAssoPerm = "assoPerm")

all.equal(amgut_comp3$properties, amgut_comp5$properties)

summary(amgut_comp5)



stefpeschel/NetCoMi documentation built on Feb. 4, 2024, 8:20 a.m.