View source: R/modulePreservation.R
modulePreservation | R Documentation |
Calculations of module preservation statistics between independent data sets.
modulePreservation(
multiData,
multiColor,
multiWeights = NULL,
dataIsExpr = TRUE,
networkType = "unsigned",
corFnc = "cor",
corOptions = "use = 'p'",
referenceNetworks = 1,
testNetworks = NULL,
nPermutations = 100,
includekMEallInSummary = FALSE,
restrictSummaryForGeneralNetworks = TRUE,
calculateQvalue = FALSE,
randomSeed = 12345,
maxGoldModuleSize = 1000,
maxModuleSize = 1000,
quickCor = 1,
ccTupletSize = 2,
calculateCor.kIMall = FALSE,
calculateClusterCoeff = FALSE,
useInterpolation = FALSE,
checkData = TRUE,
greyName = NULL,
goldName = NULL,
savePermutedStatistics = TRUE,
loadPermutedStatistics = FALSE,
permutedStatisticsFile = if (useInterpolation) "permutedStats-intrModules.RData"
else "permutedStats-actualModules.RData",
plotInterpolation = TRUE,
interpolationPlotFile = "modulePreservationInterpolationPlots.pdf",
discardInvalidOutput = TRUE,
parallelCalculation = FALSE,
verbose = 1, indent = 0)
multiData |
expression data or adjacency data
in multi-set format (see |
multiColor |
a list in which every component is a vector giving the module labels of genes in
|
multiWeights |
optional weights, only when |
dataIsExpr |
logical: if |
networkType |
network type. Allowed values are (unique abbreviations of) |
corFnc |
character string specifying the function to be used to calculate co-expression
similarity. Defaults to Pearson correlation. Another useful choice is |
corOptions |
character string specifying additional arguments to be passed to the function given
by |
referenceNetworks |
a vector giving the indices of expression data to be used as reference networks.
Reference networks must have their module labels given in |
testNetworks |
a list with one component per each entry in |
nPermutations |
specifies the number of permutations that will be calculated in the permutation test. |
includekMEallInSummary |
logical: should cor.kMEall be included in the calculated summary statistics?
Because kMEall takes into account all genes in the network, this statistic measures preservation of the full
network with respect to the eigengene of the module. This may be undesirable, hence the default is
|
restrictSummaryForGeneralNetworks |
logical: should the summary statistics for general (not
correlation) networks be restricted (density to meanAdj, connectivity to cor.kIM and cor.Adj)? The default
|
calculateQvalue |
logical: should q-values (local FDR estimates) be calculated? Package qvalue must be installed for this calculation. Note that q-values may not be meaningful when the number of modules is small and/or most modules are preserved. |
randomSeed |
seed for the random number generator. If |
maxGoldModuleSize |
maximum size of the "gold" module, i.e., the random sample of all network genes. |
maxModuleSize |
maximum module size used for calculations. Modules larger than |
quickCor |
number between 0 and 1 specifying the handling of missing data in calculation of
correlation. Zero means exact but potentially slower calculations; one means potentially faster
calculations, but with potentially inaccurate results if the proportion of missing data is large. See
|
ccTupletSize |
tuplet size for co-clustering calculations. |
calculateCor.kIMall |
logical: should cor.kMEall be calculated? This option is only valid for
adjacency input. If |
calculateClusterCoeff |
logical: should statistics based on the clustering coefficient be calculated? While these statistics may be interesting, the calculations are also computationally expensive. |
checkData |
logical: should data be checked for excessive number of missing entries? See
|
greyName |
label used for unassigned genes. Traditionally such genes are labeled by grey color or
numeric label 0. These values are the default when |
goldName |
label used for the "module" representing a random sample of the whole network.
Traditionally such genes are labeled by gold color or
numeric label 0.1. These values are the default when |
savePermutedStatistics |
logical: should calculated permutation statistics be saved? Saved statistics may be re-used if the calculation needs to be repeated. |
permutedStatisticsFile |
file name to save the permutation statistics into. |
loadPermutedStatistics |
logical: should permutation statistics be loaded? If a previously executed calculation needs to be repeated, loading permutation study results can cut the calculation time many-fold. |
useInterpolation |
logical: should permutation statistics be calculated by interpolating an artificial set of evenly spaced modules? This option may potentially speed up the calculations, but it restricts calculations to density measures. |
plotInterpolation |
logical: should interpolation plots be saved? If interpolation is used (see
|
interpolationPlotFile |
file name to save the interpolation plots into. |
discardInvalidOutput |
logical: should output columns containing no valid data be discarded? This
option may be useful when input |
parallelCalculation |
logical: should calculations be done in parallel? Note that parallel
calculations are turned off by default and will lead to somewhat DIFFERENT results than serial calculations
because the random seed is set differently. For the calculation to actually run in parallel mode, a call to
|
verbose |
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose. |
indent |
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces. |
This function calculates module preservation statistics pair-wise between given reference sets and all
other sets in multiExpr
. Reference sets must have their corresponding module assignment specified in
multiColor
; module assignment is optional for test sets. Individual expression sets and their module
labels are matched using names
of the corresponding components in multiExpr
and
multiColor
.
For each reference-test pair, the function calculates module preservation statistics that
measure how well the modules of the reference set are preserved in the test set.
If the multiColor
also contains module assignment for the test set, the calculated statistics also
include cross-tabulation statistics that make use of the test module assignment.
For each reference-test pair, the function only uses genes (columns of the data
component of each
component of multiExpr
) that are in common between the reference and test set. Columns are matched by
column names, so column names must be valid.
In addition to preservation statistics, the function also calculates several statistics of module quality, that is measures of how well-defined modules are in the reference set. The quality statistics are calculated with respect to genes in common with with a test set; thus the function calculates a set of quality statistics for each reference-test pair. This may be somewhat counter-intuitive, but it allows a direct comparison of corresponding quality and preservation statistics.
The calculated p-values are determined from the Z scores of individual measures under assumption of normality. No p-value is calculated for the Zsummary measures. Bonferoni correction to the number of tested modules. Because the p-values for strongly preserved modules are often extremely low, the function reports natural logarithms (base e) of the p-values. However, q-values are reported untransformed since they are calculated that way in package qvalue.
Missing data are removed (but see quickCor
above).
The function returns a nested list of preservation statistics. At the top level, the list components are:
quality |
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of quality statistics. All logarithms are in base 10. |
preservation |
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of density and connectivity preservation statistics. All logarithms are in base 10. |
accuracy |
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of cross-tabulation statistics. All logarithms are in base 10. |
referenceSeparability |
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of module separability in the reference network. All logarithms are in base 10. |
testSeparability |
observed values, Z scores, p-values, Bonferoni-corrected p-values, and (optionally) q-values of module separability in the test network. All logarithms are in base 10. |
permutationDetails |
results of individual permutations, useful for diagnostics |
All of the above are lists. The lists quality
, preservation
, referenceSeparability
,
and testSeparability
each contain 4 or 5 components: observed
contains observed values,
Z
contains the corresponding Z scores, log.p
contains base 10 logarithms of the p-values,
log.pBonf
contains base 10 logarithms of the Bonferoni corrected p-values, and optionally q
contains the associated q-values. The list accuracy
contains observed
, Z
, log.p
,
log.pBonf
, optionally q
,
and additional components observedOverlapCounts
and observedFisherPvalues
that contain the
observed matrices of overlap counts and Fisher test p-values.
Each of the lists observed
, Z
, log.p
,
log.pBonf
, optionally q
, observedOverlapCounts
and observedFisherPvalues
is structured as a 2-level list where the outer components correspond to reference sets and the inner
components to tests sets. As an example, preservation$observed[[1]][[2]]
contains the density and
connectivity preservation statistics for the preservation of set 1 modules in set 2, that is set 1 is the
reference set and set 2 is the test set. preservation$observed[[1]][[2]]
is a data frame in which
each row corresponds to a module in the reference network 1 plus one row for the unassigned objects, and
one row for a "module" that contains randomly sampled objects and that represents a whole-network average.
Each column corresponds to a statistic as indicated by the column name.
For large data sets, the permutation study may take a while (typically on the order of several hours). Use
verbose = 3
to get detailed progress report as the calculations advance.
Rui Luo and Peter Langfelder
Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath, to appear
Network construction and module detection functions in the WGCNA package such as
adjacency
, blockwiseModules
; rudimentary cleaning in
goodSamplesGenesMS
; the WGCNA implementation of correlation in cor
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.