Description Usage Arguments Details Value Note Author(s) References
View source: R/SYB_wrapWGCNA.R
This function makes use of the WGCNA
-package from Steve Horvath and Peter Langfelder to construct
weighted gene co-expression networks and correlates detected gene modules with phenotypes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | wrapWGCNA(
data,
projectfolder = "GEX/WGCNA",
softThresholdPower = "auto",
corType = "bicor",
networkType = "signed",
TOMType = "signed",
maxBlockSize = 45000,
TOMplot = FALSE,
MDSplot = FALSE,
phDendro = NULL,
phModule = NULL,
sampleColumn = "Sample_Name",
groupColumn = "Sample_Group",
groupsets = NULL,
symbolColumn = NULL,
flashClustMethod = "average",
moduleBoxplotsPerFigure = 16,
figure.res = 300,
dendroRowText = F,
autoColorHeight = FALSE,
colorHeight = 0.1,
cex.labels = 0.6,
...
)
|
data |
ExpressionSet, SummarizedExperiment, DESeqDataSet or MethylSet. If |
projectfolder |
character with directory for output files (will be generated if not existing). |
softThresholdPower |
soft-thresholding power for network construction. If "auto", function selects soft-thresholding power automatically. If Null, network construction is omitted. |
corType |
character string specifying the correlation to be used. Allowed values are "pearson" and "bicor", corresponding to Pearson and biweight midcorrelation, respectively. Missing values are handled using the pairwise.complete.obs option. |
networkType |
character with network type. Allowed values are "unsigned", "signed", "signed hybrid". "unsigned" means negative correlation of genes are treated the same as positive correlation. In an "signed" network, negatively correlated genes will not be put into one module, but will be treated as not correlated. |
TOMType |
character with one of "none", "unsigned", "signed". If "none", adjacency will be used for clustering. If "unsigned", the standard TOM will be used (more generally, TOM function will receive the adjacency as input). If "signed", TOM will keep track of the sign of correlations between neighbours. |
maxBlockSize |
integer giving maximum block size for module detection. If the number of genes in |
TOMplot |
boolean. If TRUE make Topological Overlap Matrix (TOM) plot (also known as connectivity plot) of the network connections. Light color represents low topological overlap and progressively darker red color represents higher overlap. Modules correspond to red squares along the diagonal. |
MDSplot |
boolean. If TRUE make Multidimensional scaling plot (MDS) to visualize pairwise relationships specified by a dissimilarity matrix. Each row of the dissimilarity matrix is visualized by a point in a Euclidean space. Each dot (gene) is colored by the module assignment. |
phDendro |
character vector with phenotypes of |
phModule |
character vector with phenotypes to correlate module eigengenes with in heatmap. |
sampleColumn |
character with column name of Sample names in pheno data of |
groupColumn |
character with column name of group names in pheno data of |
groupsets |
character vector with names of group sets in format "groupA-groupB". Groups summarized in parentheses
"(groupA-groupB)" are coded as ONE group. They are used for correlation of module eigengenes with
corresponding samples of selected groupsets. Mind that eigengenes are calculated using all samples,
while correlation is calculated for samples of denoted groupsets only.
Group names must match names in |
symbolColumn |
character with name of feature identifier in feature data of |
flashClustMethod |
character with agglomeration method used for hierarchical clustering in |
moduleBoxplotsPerFigure |
numeric. Number of module boxplots to be included in a single figure. |
figure.res |
numeric resolution for png. |
dendroRowText |
boolean. If TRUE, phenotype names are plotted beneath the sample dendrogram. |
autoColorHeight |
boolean. If TRUE, the height of the color area below the dendrogram is adjusted automatically for the number of phenotypes. |
colorHeight |
numeric specifying the height of the color area under dendrogram as a fraction of the height of the dendrogram area. Only effective when autoColorHeight above is FALSE. |
cex.labels |
numeric with character expansion factor for dendrogram and heatmap labels. |
... |
further arguments to be passed to the |
Before starting network construction, an appropriate softThresholdPower
must be selected for correlation coefficients.
If no value is given in softThresholdPower
, the function analyses scale free topology for multiple soft
thresholding powers to help choosing the appropriate value for obtaining an approximately scale free network topology.
For each power, the scale free topology fit index is calculated and returned along with other information on connectivity.
If softThresholdPower
is set to 'auto' and the function determines an appropriate value and directly starts network construction.
Network construction is performed in block-wise manner with respect to maxBlockSize
. Genes are clustered
using average linkage hierarchical clustering and coexpressed gene modules are identified in the resulting dendrogram by the
Dynamic Hybrid tree cut. Modules whose module eigengenes (MEs) are highly correlated are merged.
The function calculates the following parameter:
kME: INTRAmodular connectivity for finding intramodular hubs. Also known as module membership measure (MM). Correlation of the gene with the corresponding module eigengene. kME close to 1 means that the gene is a hub gene.
GS: gene significance: correlation of the gene with a phenotype.
Module-trait relationship: correlation of a module eigengene with a phenotype.
Phenotypes are taken from phenotype data of data
as specified in phModule
. Furthermore, membership of samples in groups
which are defined in groupsets
are also used as phenotypes (e.g. two groups from a differential gene expression experiment).
When correlation with group membership is calculated, only those samples are included which belong to the denoted groupset
(mind that gene modules were calculated using expression data from all samples).
All correlation coefficients are calculated using Pearson correlation. Categorical variables with only two levels are
coded numerically.
The returned value depends on parameter softThresholdPower
. If a softThresholdPower
is given or
could be chosen automatically, value is a list with the following components:
colors: a vector of color or numeric module labels for all genes
unmergedColors: a vector of color or numeric module labels for all genes before module merging
MEs: a data frame containing module eigengenes of the found modules (given by colors).
goodSamples: numeric vector giving indices of good samples, that is samples that do not have too many missing entries.
goodGenes: numeric vector giving indices of good genes, that is genes that do not have too many missing entries.
dendrograms: a list whose components contain hierarchical clustering dendrograms of genes in each block.
TOMFiles: character vector (one string per block), giving the file names in which blockwise topological overlaps were saved.
blockGenes: a list whose components give the indices of genes in each block.
blocks: a vector of length equal number of genes giving the block label for each gene. Note that block labels are not necessarily sorted in the order in which the blocks were processed
MEsOK: logical indicating whether the module eigengenes were calculated without errors.
If softThresholdPower
is NULL or could not be chosen automatically, value is a list with the following components:
powerEstimate: estimate of an appropriate soft-thresholding power: the lowest power for which the scale free topology fit R^2 exceeds RsquaredCut. If R^2 is below RsquaredCut for all powers, NA is returned.
fitIndices: data frame containing the fit indices for scale free topology. The columns contain the soft-thresholding power, adjusted R^2 for the linear fit, the linear coefficient, adjusted R^2 for a more complicated fit models, mean connectivity, median connectivity and maximum connectivity.
Side-effects:
Diagrams for SoftThreshold power, gene and sample dendrograms generated by hierarchical clustering with phenotypes
given in phDendro
or phModule
printed underneath as well as correlation heatmaps are plotted into the projectfolder.
Additionally, tables with module eigengenes and correlation results of eigengenes with phenotypes and groupsets are generated.
Scatterplots are generated with module membership and gene significance for each phenotype/groupset and the 8 top associated modules.
The procedure is divided in several steps:
Selection of an appropriate softThresholdPower for network construction
Automatic network construction and module detection
Plot sample dendrogram and gene dendrogramm with phenotype information.
Calculate gene significance for traits: GS.datTraits(i) = |cor(gene,Trait)|
and
GSPvalue[i] = corPvalueStudent(GS.datTraits[i], nSamples)
.
Correlation of modules with phenotypes (traits)
moduleTraitCor = cor(MEs, Trait)
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)
moduleGroupsetCor = cor(MEs, groupsetMat)
. Make groupsetMat
as phenotype matrix
from group memberships for groups denoted in groupsets
.
moduleGroupsetPvalue = corPvalueStudent(moduleGroupsetCor, nSamplesInGroups)
Heatmaps are generated for correlation results of phenotypes and groupsets.
Intramodular analysis - Find hub genes in modules
datKME: INTRAmodular connectivity for finding intramodular hubs. Also known as module membership measure (MM).
MMPvalue = corPvalueStudent(datKME, nSamples)
Calculate gene significance for group memberships: geneGroupsetCor = cor(gene,groupsetMat)
and
geneGroupsetPvalue = corPvalueStudent(geneGroupsetCor, nSamplesInGroups)
Scatterplots are generated with module membership and gene significance for each phenotype/groupset and the 8 top associated modules.
Generate output tables
networkDatOutput = data.frame(featuredata, moduleColors, GS.datTraits, GSPvalue, geneGroupsetCor, geneGroupsetPvalue)
networkDatOutput_incl_MM = data.frame(networkDatOutput, datKME[,modOrder], MMPvalue[,modOrder])
Visualization of networks within R (TOMplot, MDSplot).
Frank Ruehle
https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.