Calculate module eigengenes.
Description
Calculates module eigengenes (1st principal component) of modules in a given single dataset.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 
Arguments
expr 
Expression data for a single set in the form of a data frame where rows are samples and columns are genes (probes). 
colors 
A vector of the same length as the number of probes in 
impute 
If 
nPC 
Number of principal components and variance explained entries to be calculated. Note
that only the first principal component is returned; the rest are used only for the calculation of
proportion of variance explained. The number of returned variance explained entries is
currently 
align 
Controls whether eigengenes, whose orientation is undetermined, should be aligned with
average expression ( 
excludeGrey 
Should the improper module consisting of 'grey' genes be excluded from the eigengenes? 
grey 
Value of 
subHubs 
Controls whether hub genes should be substituted for missing eigengenes. If

trapErrors 
Controls handling of errors from that may arise when there are too many

returnValidOnly 
logical; controls whether the returned data frame of module eigengenes
contains columns
corresponding only to modules whose eigengenes or hub genes could be calculated correctly
( 
softPower 
The power used in softthresholding the adjacency matrix. Only used when the hubgene approximation is necessary because the principal component calculation failed. It must be nonnegative. The default value should only be changed if there is a clear indication that it leads to incorrect results. 
scale 
logical; can be used to turn off scaling of the expression data before calculating the singular value decomposition. The scaling should only be turned off if the data has been scaled previously, in which case the function can run a bit faster. Note however that the function first imputes, then scales the expression data in each module. If the expression contain missing data, scaling outside of the function and letting the function impute missing data may lead to slightly different results than if the data is scaled within the function. 
verbose 
Controls verbosity of printed progress messages. 0 means silent, up to (about) 5 the verbosity gradually increases. 
indent 
A single nonnegative integer controlling indentation of printed messages. 0 means no indentation, each unit above that adds two spaces. 
Details
Module eigengene is defined as the first principal component of the expression matrix of the
corresponding module. The calculation may fail if the expression data has too many missing entries.
Handling of such errors is controlled by the arguments subHubs
and
trapErrors
.
If subHubs==TRUE
, errors in principal component calculation will be trapped and a substitute
calculation of hubgenes will be attempted. If this fails as well, behaviour depends on
trapErrors
: if TRUE
, the offending
module will be ignored and the return value will allow the user to remove the module from further
analysis; if FALSE
, the function will stop.
From the user's point of view, setting trapErrors=FALSE
ensures that if the function returns
normally, there will be a valid eigengene (principal component or hubgene) for each of the input
colors. If the user sets trapErrors=TRUE
, all calculational (but not input) errors will be
trapped, but the user should check the output (see below) to make sure all modules have a valid
returned eigengene.
While the principal component calculation can fail even on relatively sound data
(it does not take all that many "wellplaced" NA
to torpedo the
calculation),
it takes many more irregularities in the data for the hubgene calculation to
fail. In fact such a failure signals there likely is something seriously wrong with the data.
Value
A list with the following components:
eigengenes 
Module eigengenes in a dataframe, with each column corresponding to one eigengene.
The columns are named by the corresponding color with an 
averageExpr 
If 
varExplained 
A dataframe in which each column corresponds to a module, with the component

nPC 
A copy of the input 
validMEs 
A boolean vector. Each component (corresponding to the columns in 
validColors 
A copy of the input colors with entries corresponding to invalid modules set to

allOK 
Boolean flag signalling whether all eigengenes have been calculated correctly, either as principal components or as the hubgene average approximation. 
allPC 
Boolean flag signalling whether all returned eigengenes are principal components. 
isPC 
Boolean vector. Each component (corresponding to the columns in 
isHub 
Boolean vector. Each component (corresponding to the columns in 
validAEs 
Boolean vector. Each component (corresponding to the columns in 
allAEOK 
Boolean flag signalling whether all returned module average expressions contain
valid data. Note that 
Author(s)
Steve Horvath SHorvath@mednet.ucla.edu, Peter Langfelder Peter.Langfelder@gmail.com
References
Zhang, B. and Horvath, S. (2005), "A General Framework for Weighted Gene CoExpression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17
See Also
svd
, impute.knn