consensusTOM  R Documentation 
Calculation of a consensus network (topological overlap).
consensusTOM( # Supply either ... # ... information needed to calculate individual TOMs multiExpr, # Data checking options checkMissingData = TRUE, # Blocking options blocks = NULL, maxBlockSize = 5000, blockSizePenaltyPower = 5, nPreclusteringCenters = NULL, randomSeed = 54321, # Network construction arguments: correlation options corType = "pearson", maxPOutliers = 1, quickCor = 0, pearsonFallback = "individual", cosineCorrelation = FALSE, replaceMissingAdjacencies = FALSE, # Adjacency function options power = 6, networkType = "unsigned", checkPower = TRUE, # Topological overlap options TOMType = "unsigned", TOMDenom = "min", suppressNegativeTOM = FALSE, # Save individual TOMs? saveIndividualTOMs = TRUE, individualTOMFileNames = "individualTOMSet%sBlock%b.RData", # ... or individual TOM information individualTOMInfo = NULL, useIndivTOMSubset = NULL, ##### Consensus calculation options useBlocks = NULL, networkCalibration = c("single quantile", "full quantile", "none"), # Save calibrated TOMs? saveCalibratedIndividualTOMs = FALSE, calibratedIndividualTOMFilePattern = "calibratedIndividualTOMSet%sBlock%b.RData", # Simple quantile calibration options calibrationQuantile = 0.95, sampleForCalibration = TRUE, sampleForCalibrationFactor = 1000, getNetworkCalibrationSamples = FALSE, # Consensus definition consensusQuantile = 0, useMean = FALSE, setWeights = NULL, # Return options saveConsensusTOMs = TRUE, consensusTOMFilePattern = "consensusTOMBlock%b.RData", returnTOMs = FALSE, # Internal handling of TOMs useDiskCache = NULL, chunkSize = NULL, cacheDir = ".", cacheBase = ".blockConsModsCache", nThreads = 1, # Diagnostic messages verbose = 1, indent = 0)
multiExpr 
expression data in the multiset format (see 
checkMissingData 
logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details. 
blocks 
optional specification of blocks in which hierarchical clustering and module detection
should be performed. If given, must be a numeric vector with one entry per gene
of 
maxBlockSize 
integer giving maximum block size for module detection. Ignored if 
blockSizePenaltyPower 
number specifying how strongly blocks should be penalized for exceeding the
maximum size. Set to a lrge number or 
nPreclusteringCenters 
number of centers for preclustering. Larger numbers typically results in better
but slower preclustering. The default is 
randomSeed 
integer to be used as seed for the random number generator before the function
starts. If a current seed exists, it is saved and restored upon exit. If 
corType 
character string specifying the correlation to be used. Allowed values are (unique
abbreviations of) 
maxPOutliers 
only used for 
quickCor 
real number between 0 and 1 that controls the handling of missing data in the calculation of correlations. See details. 
pearsonFallback 
Specifies whether the bicor calculation, if used, should revert to Pearson when
median absolute deviation (mad) is zero. Recongnized values are (abbreviations of)

cosineCorrelation 
logical: should the cosine version of the correlation calculation be used? The cosine calculation differs from the standard one in that it does not subtract the mean. 
power 
softthresholding power for network construction. 
networkType 
network type. Allowed values are (unique abbreviations of) 
checkPower 
logical: should basic sanity check be performed on the supplied 
replaceMissingAdjacencies 
logical: should missing values in the calculation of adjacency be replaced by 0? 
TOMType 
one of 
TOMDenom 
a character string specifying the TOM variant to be used. Recognized values are

suppressNegativeTOM 
Logical: should the result be set to zero when negative? Negative TOM values can occur when

saveIndividualTOMs 
logical: should individual TOMs be saved to disk for later use? 
individualTOMFileNames 
character string giving the file names to save individual TOMs into. The
following tags should be used to make the file names unique for each set and block: 
individualTOMInfo 
Optional data for TOM matrices in individual data sets. This object is returned by
the function 
useIndivTOMSubset 
If 
useBlocks 
optional specification of blocks that should be used for the calcualtions. The default is to use all blocks. 
networkCalibration 
network calibration method. One of "single quantile", "full quantile", "none" (or a unique abbreviation of one of them). 
saveCalibratedIndividualTOMs 
logical: should the calibrated individual TOMs be saved? 
calibratedIndividualTOMFilePattern 
pattern of file names for saving calibrated individual TOMs. 
calibrationQuantile 
if 
sampleForCalibration 
if 
sampleForCalibrationFactor 
determines the number of samples for calibration: the number is

getNetworkCalibrationSamples 
logical: should the sampled values used for network calibration be returned? 
consensusQuantile 
quantile at which consensus is to be defined. See details. 
useMean 
logical: should the consensus be determined from a (possibly weighted) mean across the data sets rather than a quantile? 
setWeights 
Optional vector (one component per input set) of weights to be used for weighted mean
consensus. Only used when 
saveConsensusTOMs 
logical: should the consensus topological overlap matrices for each block be saved and returned? 
consensusTOMFilePattern 
character string containing the file namefiles containing the
consensus topological overlaps. The tag 
returnTOMs 
logical: should calculated consensus TOM(s) be returned? 
useDiskCache 
should calculated network similarities in individual sets be temporarilly saved
to disk? Saving to disk is somewhat slower than keeping all data in memory, but for large blocks and/or
many sets the memory footprint may be too big. If not given (the default), the function will determine
the need of caching based on the size of the data. See 
chunkSize 
network similarities are saved in smaller chunks of size 
cacheDir 
character string containing the directory into which cache files should be written. The user should make sure that the filesystem has enough free space to hold the cache files which can get quite large. 
cacheBase 
character string containing the desired name for the cache files. The actual file
names will consists of 
nThreads 
nonnegative integer specifying the number of parallel threads to be used by certain parts of correlation calculations. This option only has an effect on systems on which a POSIX thread library is available (which currently includes Linux and Mac OSX, but excludes Windows). If zero, the number of online processors will be used if it can be determined dynamically, otherwise correlation calculations will use 2 threads. 
verbose 
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose. 
indent 
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces. 
The function starts by optionally filtering out samples that have too many missing entries and genes
that have either too many missing entries or zero variance in at least one set. Genes that are filtered
out are left unassigned by the module detection. Returned eigengenes will contain NA
in entries
corresponding to filteredout samples.
If blocks
is not given and
the number of genes exceeds maxBlockSize
, genes are preclustered into blocks using the function
consensusProjectiveKMeans
; otherwise all genes are treated in a single block.
For each block of genes, the network is constructed and (if requested) topological overlap is calculated in each set. To minimize memory usage, calculated topological overlaps are optionally saved to disk in chunks until they are needed again for the calculation of the consensus network topological overlap.
Before calculation of the consensus Topological Overlap, individual TOMs are optionally calibrated. Calibration methods include single quantile scaling and full quantile normalization.
Single quantile
scaling raises individual TOM in sets 2,3,... to a power such that the quantiles given by
calibrationQuantile
agree with the quantile in set 1. Since the high TOMs are usually the most
important
for module identification, the value of calibrationQuantile
is close to (but not equal) 1. To speed up
quantile calculation, the quantiles can be determined on a randomlychosen component subset of the TOM
matrices.
Full quantile normalization, implemented in normalize.quantiles
, adjusts the
TOM matrices such that all quantiles equal each other (and equal to the quantiles of the componentwise
average of the individual TOM matrices).
Note that network calibration is performed separately in each block, i.e., the normalizing transformation may differ between blocks. This is necessary to avoid manipulating a full TOM in memory.
The consensus TOM is calculated as the componentwise consensusQuantile
quantile of the individual
(set) TOMs; that is, for each gene pair (TOM entry), the consensusQuantile
quantile across all input
sets. Alternatively, one can also use (weighted) componentwise mean across all imput data sets.
If requested, the consensus topological overlaps are saved to disk for later use.
List with the following components:
consensusTOM 
only present if input 
TOMFiles 
only present if input 
saveConsensusTOMs 
a copy of the inputsaveConsensusTOMs. 
individualTOMInfo 
information about individual set TOMs. A copy of the input 
Further components are retained for debugging and/or convenience.
useIndivTOMSubset 
a copy of the input 
goodSamplesAndGenes 
a list containing information about which samples and genes are "good" in the sense
that they do not contain more than a certain fraction of missing data and (for genes) have nonzero variance.
See 
nGGenes 
number of "good" genes in 
nSets 
number of input sets. 
saveCalibratedIndividualTOMs 
a copy of the input 
calibratedIndividualTOMFileNames 
if input 
networkCalibrationSamples 
if input 
consensusQuantile 
a copy of the input 
originCount 
A vector of length 
Peter Langfelder
WGCNA methodology has been described in
Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene CoExpression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17 PMID: 16646834
The original reference for the WGCNA package is
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559 PMID: 19114008
For consensus modules, see
Langfelder P, Horvath S (2007) "Eigengene networks for studying the relationships between coexpression modules", BMC Systems Biology 2007, 1:54
This function uses quantile normalization described, for example, in
Bolstad BM1, Irizarry RA, Astrand M, Speed TP (2003) "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias", Bioinformatics. 2003 Jan 22;19(2):1
blockwiseIndividualTOMs
for calculation of topological overlaps across multiple sets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.