netConstruct | R Documentation |
Constructing microbial association networks and dissimilarity based networks (where nodes are subjects) from compositional count data.
netConstruct(data,
data2 = NULL,
dataType = "counts",
group = NULL,
matchDesign = NULL,
taxRank = NULL,
# Association/dissimilarity measure:
measure = "spieceasi",
measurePar = NULL,
# Preprocessing:
jointPrepro = NULL,
filtTax = "none",
filtTaxPar = NULL,
filtSamp = "none",
filtSampPar = NULL,
zeroMethod = "none",
zeroPar = NULL,
normMethod = "none",
normPar = NULL,
# Sparsification:
sparsMethod = "t-test",
thresh = 0.3,
alpha = 0.05,
adjust = "adaptBH",
trueNullMethod = "convest",
lfdrThresh = 0.2,
nboot = 1000L,
assoBoot = NULL,
cores = 1L,
logFile = "log.txt",
softThreshType = "signed",
softThreshPower = NULL,
softThreshCut = 0.8,
kNeighbor = 3L,
knnMutual = FALSE,
# Transformation:
dissFunc = "signed",
dissFuncPar = NULL,
simFunc = NULL,
simFuncPar = NULL,
scaleDiss = TRUE,
weighted = TRUE,
# Further arguments:
sampleSize = NULL,
verbose = 2,
seed = NULL
)
data |
numeric matrix. Can be a count matrix (rows are samples, columns
are OTUs/taxa), a phyloseq object, or an association/dissimilarity matrix
( |
data2 |
optional numeric matrix used for constructing a second network (belonging to group 2). Can be either a second count matrix/phyloseq object or a second association/dissimilarity matrix. |
dataType |
character indicating the data type. Defaults to "counts",
which means that |
group |
optional binary vector used for splitting the data into two
groups. If |
matchDesign |
Numeric vector with two elements specifying an optional
matched-group (i.e. matched-pair) design, which is used for the permutation
tests in |
taxRank |
character indicating the taxonomic rank at which the network
should be constructed. Only used if data (and data 2) is a phyloseq object.
The given rank must match one of the column names of the taxonomic table
(the |
measure |
character specifying the method used for either computing the
associations between taxa or dissimilarities between subjects.
Ignored if |
measurePar |
list with parameters passed to the function for computing
associations/dissimilarities. See 'Details' for the respective functions.
For SpiecEasi or SPRING as association measure, an additional list element
"symBetaMode" is accepted to define the "mode" argument of
|
jointPrepro |
logical indicating whether data preprocessing (filtering,
zero treatment, normalization) should be done for the combined data sets,
or each data set separately. Ignored if a single network is constructed.
Defaults to |
filtTax |
character indicating how taxa shall be filtered. Possible options are:
Except for "highestVar" and "highestFreq", different filter methods can be
combined. The values x are set via |
filtTaxPar |
list with parameters for the filter methods given by
|
filtSamp |
character indicating how samples shall be filtered. Possible options are:
Except for "highestFreq", different filter methods can be
combined. The values x are set via |
filtSampPar |
list with parameters for the filter methods given by
|
zeroMethod |
character indicating the method used for zero replacement.
Possible values are: |
zeroPar |
list with parameters passed to the function for zero
replacement ( |
normMethod |
character indicating the normalization method (to
make counts of different samples comparable). Possible options are:
|
normPar |
list with parameters passed to the function for normalization
(defined by |
sparsMethod |
character indicating the method used for sparsification (selected edges that are connected in the network). Available methods are:
|
thresh |
numeric vector with one or two elements defining the threshold
used for sparsification if |
alpha |
numeric vector with one or two elements indicating the significance level. Only used if Student's t-test or bootstrap procedure is used as sparsification method. If two networks are constructed and one value is given, it is used for both groups. Defaults to 0.05. |
adjust |
character indicating the method used for multiple testing
adjustment (if Student's t-test or bootstrap procedure is used for edge
selection). Possible values are |
trueNullMethod |
character indicating the method used for estimating the
proportion of true null hypotheses from a vector of p-values. Used for the
adaptive Benjamini-Hochberg method for multiple testing adjustment (chosen
by |
lfdrThresh |
numeric vector with one or two elements defining the
threshold(s) for local FDR correction (if |
nboot |
integer indicating the number of bootstrap samples, if bootstrapping is used as sparsification method. |
assoBoot |
logical or list. Only relevant for bootstrapping.
Set to |
cores |
integer indicating the number of CPU cores used for
bootstrapping. If cores > 1, bootstrapping is performed parallel.
|
logFile |
character defining a log file to which the iteration numbers
are stored if bootstrapping is used for sparsification. The file is written
to the current working directory. Defaults to |
softThreshType |
character indicating the method used for transforming
correlations into similarities if soft thresholding is used as sparsification
method ( |
softThreshPower |
numeric vector with one or two elements defining the
power for soft thresholding. Only used if
|
softThreshCut |
numeric vector with one or two elements (each between 0
and 1) indicating the desired minimum scale free topology fitting index
(corresponds to the argument "RsquaredCut" in
|
kNeighbor |
integer specifying the number of neighbors if the k-nearest neighbor method is used for sparsification. Defaults to 3L. |
knnMutual |
logical used for k-nearest neighbor sparsification. If
|
dissFunc |
method used for transforming associations into
dissimilarities. Can be a character with one of the following values:
|
dissFuncPar |
optional list with parameters if a function is passed to
|
simFunc |
function for transforming dissimilarities into similarities. Defaults to f(x)=1-x for dissimilarities in [0,1], and f(x)=1/(1 + x) otherwise. |
simFuncPar |
optional list with parameters for the function passed to
|
scaleDiss |
logical. Indicates whether dissimilarity values should be
scaled to [0,1] by (x - min(dissEst)) / (max(dissEst) - min(dissEst)),
where dissEst is the matrix with estimated dissimilarities.
Defaults to |
weighted |
logical. If |
sampleSize |
numeric vector with one or two elements giving the number of samples that have been used for computing the association matrix. Only needed if an association matrix is given instead of a count matrix and if, in addition, Student's t-test is used for edge selection. If two networks are constructed and one value is given, it is used for both groups. |
verbose |
integer indicating the level of verbosity. Possible values:
|
seed |
integer giving a seed for reproducibility of the results. |
The object returned by netConstruct
can either be passed to
netAnalyze
for network analysis, or to
diffnet
to construct a differential network from the
estimated associations.
The function enables the construction of either a single network
or two networks. The latter can be compared using the function
netCompare
.
The network(s) can either be based on associations (correlation,
partial correlation / conditional dependence, proportionality) or
dissimilarities. Several measures are available, respectively,
to estimate associations or dissimilarities using netConstruct
.
Alternatively, a pre-generated association or dissimilarity matrix is
accepted as input to start the workflow (argument dataType
must be
set appropriately).
Depending on the measure, network nodes are either taxa or subjects:
In association-based networks nodes are taxa, whereas in
dissimilarity-based networks nodes are subjects.
In order to perform a network comparison, the following options
for constructing two networks are available:
Passing the combined count matrix to data
and a group
vector to group
(of length nrow(data)
for association
networks and of length ncol(data)
for dissimilarity-based
networks).
Passing the count data for group 1 to data
(matrix or
phyloseq object) and the count data for group 2 to data2
(matrix
or phyloseq object). For association networks, the column names must
match, and for dissimilarity networks the row names.
Passing an association/dissimilarity matrix for group 1 to
data
and an association/dissimilarity matrix for group 2 to
data2
.
Group labeling:
If two networks are generated, the network belonging to data
is always denoted by "group 1" and the network belonging to data2
by "group 2".
If a group vector is used for splitting the data into two groups, the group
names are assigned according to the order of group levels. If group
contains the levels 0 and 1, for instance, "group 1" is assigned to level 0
and "group 2" is assigned to level 1.
In the network plot, group 1 is shown on the left and group 2 on the
right if not defined otherwise (see plot.microNetProps
).
Association measures
Argument | Function |
"pearson" | cor |
"spearman" | cor |
"bicor" | bicor |
"sparcc" | sparcc |
"cclasso" | cclasso |
"ccrepe" | ccrepe |
"spieceasi" | spiec.easi |
"spring" | SPRING |
"gcoda" | gcoda |
"propr" | propr |
Dissimilarity measures
Argument | Function | Measure |
"euclidean" | vegdist | Euclidean distance |
"bray" | vegdist | Bray-Curtis dissimilarity |
"kld" | KLD | Kullback-Leibler divergence |
"jeffrey" | KLD | Jeffrey divergence |
"jsd" | KLD | Jensen-Shannon divergence |
"ckld" | log | Compositional Kullback-Leibler divergence |
"aitchison" | vegdist ,
cenLR | Aitchison distance |
Definitions:
Since KLD is not symmetric, 0.5 * (KLD(p(x)||p(y)) + KLD(p(y)||p(x))) is returned.
Jeff = KLD(p(x)||p(y)) + KLD(p(y)||p(x))
JSD = 0.5 KLD(P||M) + 0.5 KLD(Q||M), where P=p(x), Q=p(y), and M=0.5(P+Q).
cKLD(x,y) = p/2 * log(A(x/y) * A(y/x)), where A(x/y) is the arithmetic mean of the vector of ratios x/y.
Euclidean distance of the clr-transformed data.
Methods for zero replacement
Argument | Method | Function |
"none" | No zero replacement (only available if no zero replacement is needed for the chosen normalization method and association/dissimilarity measure). | - |
"pseudo" | A pseudo count (defined by pseudocount as
optional element of zeroPar ) is added to all counts. A unit zero
count is used by default. | - |
"pseudoZO" | A pseudo count (defined by pseudocount as
optional element of zeroPar ) is added to zero counts only.
A unit zero count is used by default. | - |
"multRepl" | Multiplicative simple replacement |
multRepl |
"alrEM" | Modified EM alr-algorithm | lrEM |
"bayesMult" | Bayesian-multiplicative replacement |
cmultRepl |
Normalization methods
Argument | Method | Function |
"TSS" | Total sum scaling | t(apply(countMat, 1, function(x) x/sum(x))) |
"CSS" | Cumulative sum scaling |
cumNormMat |
"COM" | Common sum scaling | t(apply(countMat, 1, function(x) x * min(rowSums(countMat)) / sum(x))) |
"rarefy" | Rarefying | rrarefy |
"VST" | Variance stabilizing transformation |
varianceStabilizingTransformation |
"clr" | Centered log-ratio transformation |
clr |
"mclr" | Modified central log ratio transformation |
mclr
|
These methods (except for rarefying) are described in
Badri et al.(2020).
Transformation methods
Functions used for transforming associations into dissimilarities:
Argument | Function |
"signed" | sqrt(0.5 * (1 - x)) |
"unsigned" | sqrt(1 - x^2) |
"signedPos" | diss <- sqrt(0.5 * (1-x)) |
diss[x < 0] <- 0 | |
"TOMdiss" | TOMdist
|
An object of class microNet
containing the following elements:
edgelist1, edgelist2 | Edge list with the following columns:
|
assoMat1, assoMat2 | Sparsified associations (NULL for
dissimilarity based networks) |
dissMat1, dissMat2 | Sparsified dissimilarities (for association networks, these are the sparsified associations transformed into dissimilarities) |
simMat1, simMat2 | Sparsified similarities |
adjaMat1, adjaMat2 | Adjacency matrices |
assoEst1, assoEst2 | Estimated associations (NULL for
dissimilarity based networks) |
dissEst1, dissEst2 | Estimated dissimilarities (NULL for
association networks) |
dissScale1, dissScale2 | Scaled dissimilarities (NULL for
association networks) |
assoBoot1, assoBoot2 | List with association matrices for the
bootstrap samples. Returned if bootstrapping is used for
sparsification and assoBoot is TRUE . |
countMat1, countMat2 | Count matrices after filtering but before
zero replacement and normalization. Only returned if jointPrepro
is FALSE or for a single network. |
countsJoint | Joint count matrix after filtering but before
zero replacement and normalization. Only returned if jointPrepro
is TRUE . |
normCounts1, normCounts2 | Count matrices after zero handling and normalization |
groups | Names of the factor levels according to which the groups have been built |
softThreshPower | Determined (or given) power for soft-thresholding. |
assoType | Data type (either given by dataType or
determined from measure ) |
twoNets | Indicates whether two networks have been constructed |
parameters | Parameters used for network construction |
badri2020normalizationNetCoMi
\insertRefbenjamini2000adaptiveNetCoMi
\insertReffarcomeni2007someNetCoMi
\insertReffriedman2012inferringNetCoMi
\insertRefWGCNApackageNetCoMi
\insertRefzhang2005generalNetCoMi
netAnalyze
for analyzing the constructed
network(s), netCompare
for network comparison,
diffnet
for constructing differential networks.
# Load data sets from American Gut Project (from SpiecEasi package)
data("amgut1.filt")
data("amgut2.filt.phy")
# Single network with the following specifications:
# - Association measure: SpiecEasi
# - SpiecEasi parameters are defined via 'measurePar'
# (check ?SpiecEasi::spiec.easi for available options)
# - Note: 'rep.num' should be higher for real data sets
# - Taxa filtering: Keep the 50 taxa with highest variance
# - Sample filtering: Keep samples with a total number of reads
# of at least 1000
net1 <- netConstruct(amgut2.filt.phy,
measure = "spieceasi",
measurePar = list(method = "mb",
pulsar.params = list(rep.num = 10),
symBetaMode = "ave"),
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 1000),
sparsMethod = "none",
normMethod = "none",
verbose = 3)
# Network analysis (see ?netAnalyze for details)
props1 <- netAnalyze(net1, clustMethod = "cluster_fast_greedy")
# Network plot (see ?plot.microNetProps for details)
plot(props1)
#----------------------------------------------------------------------------
# Same network as before but on genus level and without taxa filtering
amgut.genus.phy <- phyloseq::tax_glom(amgut2.filt.phy, taxrank = "Rank6")
dim(phyloseq::otu_table(amgut.genus.phy))
# Rename taxonomic table and make Rank6 (genus) unique
amgut.genus.renamed <- renameTaxa(amgut.genus.phy, pat = "<name>",
substPat = "<name>_<subst_name>(<subst_R>)",
numDupli = "Rank6")
net_genus <- netConstruct(amgut.genus.renamed,
taxRank = "Rank6",
measure = "spieceasi",
measurePar = list(method = "mb",
pulsar.params = list(rep.num = 10),
symBetaMode = "ave"),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 1000),
sparsMethod = "none",
normMethod = "none",
verbose = 3)
# Network analysis
props_genus <- netAnalyze(net_genus, clustMethod = "cluster_fast_greedy")
# Network plot (with some modifications)
plot(props_genus,
shortenLabels = "none",
labelScale = FALSE,
cexLabels = 0.8)
#----------------------------------------------------------------------------
# Single network with the following specifications:
# - Association measure: Pearson correlation
# - Taxa filtering: Keep the 50 taxa with highest frequency
# - Sample filtering: Keep samples with a total number of reads of at least
# 1000 and with at least 10 taxa with a non-zero count
# - Zero replacement: A pseudo count of 0.5 is added to all counts
# - Normalization: clr transformation
# - Sparsification: Threshold = 0.3
# (an edge exists between taxa with an estimated association >= 0.3)
net2 <- netConstruct(amgut2.filt.phy,
measure = "pearson",
filtTax = "highestFreq",
filtTaxPar = list(highestFreq = 50),
filtSamp = c("numbTaxa", "totalReads"),
filtSampPar = list(totalReads = 1000, numbTaxa = 10),
zeroMethod = "pseudo",
zeroPar = list(pseudocount = 0.5),
normMethod = "clr",
sparsMethod = "threshold",
thresh = 0.3,
verbose = 3)
# Network analysis
props2 <- netAnalyze(net2, clustMethod = "cluster_fast_greedy")
plot(props2)
#----------------------------------------------------------------------------
# Constructing and analyzing two networks
# - A random group variable is used for splitting the data into two groups
set.seed(123456)
group <- sample(1:2, nrow(amgut1.filt), replace = TRUE)
# Option 1: Use the count matrix and group vector as input:
net3 <- netConstruct(amgut1.filt,
group = group,
measure = "pearson",
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 1000),
zeroMethod = "multRepl",
normMethod = "clr",
sparsMethod = "t-test")
# Option 2: Pass the count matrix of group 1 to 'data'
# and that of group 2 to 'data2'
# Note: Argument 'jointPrepro' is set to FALSE by default (the data sets
# are filtered separately and the intersect of filtered taxa is kept,
# which leads to less than 50 taxa in this example).
amgut1 <- amgut1.filt[group == 1, ]
amgut2 <- amgut1.filt[group == 2, ]
net3 <- netConstruct(data = amgut1,
data2 = amgut2,
measure = "pearson",
filtTax = "highestVar",
filtTaxPar = list(highestVar = 50),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 1000),
zeroMethod = "multRepl",
normMethod = "clr",
sparsMethod = "t-test")
# Network analysis
# Note: Please zoom into the GCM plot or open a new window using:
# x11(width = 10, height = 10)
props3 <- netAnalyze(net3, clustMethod = "cluster_fast_greedy")
# Network plot (same layout is used in both groups)
plot(props3, sameLayout = TRUE)
# The two networks can be compared with NetCoMi's function netCompare().
#----------------------------------------------------------------------------
# Example of using the argument "assoBoot"
# This functionality is useful for splitting up a large number of bootstrap
# replicates and run the bootstrapping procedure iteratively.
niter <- 5
nboot <- 1000
# Overall number of bootstrap replicates: 5000
# Use a different seed for each iteration
seeds <- sample.int(1e8, size = niter)
# List where all bootstrap association matrices are stored
assoList <- list()
for (i in 1:niter) {
# assoBoot is set to TRUE to return the bootstrap association matrices
net <- netConstruct(amgut1.filt,
filtTax = "highestFreq",
filtTaxPar = list(highestFreq = 50),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 0),
measure = "pearson",
normMethod = "clr",
zeroMethod = "pseudoZO",
sparsMethod = "bootstrap",
cores = 1,
nboot = nboot,
assoBoot = TRUE,
verbose = 3,
seed = seeds[i])
assoList[(1:nboot) + (i - 1) * nboot] <- net$assoBoot1
}
# Construct the actual network with all 5000 bootstrap association matrices
net_final <- netConstruct(amgut1.filt,
filtTax = "highestFreq",
filtTaxPar = list(highestFreq = 50),
filtSamp = "totalReads",
filtSampPar = list(totalReads = 0),
measure = "pearson",
normMethod = "clr",
zeroMethod = "pseudoZO",
sparsMethod = "bootstrap",
cores = 1,
nboot = nboot * niter,
assoBoot = assoList,
verbose = 3)
# Network analysis
props <- netAnalyze(net_final, clustMethod = "cluster_fast_greedy")
# Network plot
plot(props)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.