wrapWGCNA: Wrapper for weighted gene co-expression network analysis.

Description Usage Arguments Details Value Note Author(s) References

View source: R/SYB_wrapWGCNA.R

Description

This function makes use of the WGCNA-package from Steve Horvath and Peter Langfelder to construct weighted gene co-expression networks and correlates detected gene modules with phenotypes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
wrapWGCNA(
  data,
  projectfolder = "GEX/WGCNA",
  softThresholdPower = "auto",
  corType = "bicor",
  networkType = "signed",
  TOMType = "signed",
  maxBlockSize = 45000,
  TOMplot = FALSE,
  MDSplot = FALSE,
  phDendro = NULL,
  phModule = NULL,
  sampleColumn = "Sample_Name",
  groupColumn = "Sample_Group",
  groupsets = NULL,
  symbolColumn = NULL,
  flashClustMethod = "average",
  moduleBoxplotsPerFigure = 16,
  figure.res = 300,
  dendroRowText = F,
  autoColorHeight = FALSE,
  colorHeight = 0.1,
  cex.labels = 0.6,
  ...
)

Arguments

data

ExpressionSet, SummarizedExperiment, DESeqDataSet or MethylSet. If data is character containing a filepath, the functions assumes previously stored network object to be loaded from this path. If data is "load_default", network object is loaded from default directory "file.path(projectfolder, "TOM", "networkConstruction-auto.RData")".

projectfolder

character with directory for output files (will be generated if not existing).

softThresholdPower

soft-thresholding power for network construction. If "auto", function selects soft-thresholding power automatically. If Null, network construction is omitted.

corType

character string specifying the correlation to be used. Allowed values are "pearson" and "bicor", corresponding to Pearson and biweight midcorrelation, respectively. Missing values are handled using the pairwise.complete.obs option.

networkType

character with network type. Allowed values are "unsigned", "signed", "signed hybrid". "unsigned" means negative correlation of genes are treated the same as positive correlation. In an "signed" network, negatively correlated genes will not be put into one module, but will be treated as not correlated.

TOMType

character with one of "none", "unsigned", "signed". If "none", adjacency will be used for clustering. If "unsigned", the standard TOM will be used (more generally, TOM function will receive the adjacency as input). If "signed", TOM will keep track of the sign of correlations between neighbours.

maxBlockSize

integer giving maximum block size for module detection. If the number of genes in data exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize (maxBlockSize must not exceed 46340). It's intended to use as big block sizes as possible, but mind that big blocksizes will heavily impact memory usage.

TOMplot

boolean. If TRUE make Topological Overlap Matrix (TOM) plot (also known as connectivity plot) of the network connections. Light color represents low topological overlap and progressively darker red color represents higher overlap. Modules correspond to red squares along the diagonal.

MDSplot

boolean. If TRUE make Multidimensional scaling plot (MDS) to visualize pairwise relationships specified by a dissimilarity matrix. Each row of the dissimilarity matrix is visualized by a point in a Euclidean space. Each dot (gene) is colored by the module assignment.

phDendro

character vector with phenotypes of data object to be displayed in sample dendrogram.

phModule

character vector with phenotypes to correlate module eigengenes with in heatmap.

sampleColumn

character with column name of Sample names in pheno data of data.

groupColumn

character with column name of group names in pheno data of data.

groupsets

character vector with names of group sets in format "groupA-groupB". Groups summarized in parentheses "(groupA-groupB)" are coded as ONE group. They are used for correlation of module eigengenes with corresponding samples of selected groupsets. Mind that eigengenes are calculated using all samples, while correlation is calculated for samples of denoted groupsets only. Group names must match names in groupColumn. Omitted if NULL.

symbolColumn

character with name of feature identifier in feature data of data.

flashClustMethod

character with agglomeration method used for hierarchical clustering in flashClust-package. Either "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".

moduleBoxplotsPerFigure

numeric. Number of module boxplots to be included in a single figure.

figure.res

numeric resolution for png.

dendroRowText

boolean. If TRUE, phenotype names are plotted beneath the sample dendrogram.

autoColorHeight

boolean. If TRUE, the height of the color area below the dendrogram is adjusted automatically for the number of phenotypes.

colorHeight

numeric specifying the height of the color area under dendrogram as a fraction of the height of the dendrogram area. Only effective when autoColorHeight above is FALSE.

cex.labels

numeric with character expansion factor for dendrogram and heatmap labels.

...

further arguments to be passed to the blockwiseModules-function of the WGCNA-package.

Details

Before starting network construction, an appropriate softThresholdPower must be selected for correlation coefficients. If no value is given in softThresholdPower, the function analyses scale free topology for multiple soft thresholding powers to help choosing the appropriate value for obtaining an approximately scale free network topology. For each power, the scale free topology fit index is calculated and returned along with other information on connectivity. If softThresholdPower is set to 'auto' and the function determines an appropriate value and directly starts network construction. Network construction is performed in block-wise manner with respect to maxBlockSize. Genes are clustered using average linkage hierarchical clustering and coexpressed gene modules are identified in the resulting dendrogram by the Dynamic Hybrid tree cut. Modules whose module eigengenes (MEs) are highly correlated are merged. The function calculates the following parameter:

Phenotypes are taken from phenotype data of data as specified in phModule. Furthermore, membership of samples in groups which are defined in groupsets are also used as phenotypes (e.g. two groups from a differential gene expression experiment). When correlation with group membership is calculated, only those samples are included which belong to the denoted groupset (mind that gene modules were calculated using expression data from all samples). All correlation coefficients are calculated using Pearson correlation. Categorical variables with only two levels are coded numerically.

Value

The returned value depends on parameter softThresholdPower. If a softThresholdPower is given or could be chosen automatically, value is a list with the following components:

If softThresholdPower is NULL or could not be chosen automatically, value is a list with the following components:

Side-effects: Diagrams for SoftThreshold power, gene and sample dendrograms generated by hierarchical clustering with phenotypes given in phDendro or phModule printed underneath as well as correlation heatmaps are plotted into the projectfolder. Additionally, tables with module eigengenes and correlation results of eigengenes with phenotypes and groupsets are generated. Scatterplots are generated with module membership and gene significance for each phenotype/groupset and the 8 top associated modules.

Note

The procedure is divided in several steps:

  1. Selection of an appropriate softThresholdPower for network construction

  2. Automatic network construction and module detection

  3. Plot sample dendrogram and gene dendrogramm with phenotype information. Calculate gene significance for traits: GS.datTraits(i) = |cor(gene,Trait)| and GSPvalue[i] = corPvalueStudent(GS.datTraits[i], nSamples).

  4. Correlation of modules with phenotypes (traits)

    • moduleTraitCor = cor(MEs, Trait)

    • moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)

    • moduleGroupsetCor = cor(MEs, groupsetMat). Make groupsetMat as phenotype matrix from group memberships for groups denoted in groupsets.

    • moduleGroupsetPvalue = corPvalueStudent(moduleGroupsetCor, nSamplesInGroups)

    • Heatmaps are generated for correlation results of phenotypes and groupsets.

  5. Intramodular analysis - Find hub genes in modules

    • datKME: INTRAmodular connectivity for finding intramodular hubs. Also known as module membership measure (MM).

    • MMPvalue = corPvalueStudent(datKME, nSamples)

    • Calculate gene significance for group memberships: geneGroupsetCor = cor(gene,groupsetMat) and geneGroupsetPvalue = corPvalueStudent(geneGroupsetCor, nSamplesInGroups)

    • Scatterplots are generated with module membership and gene significance for each phenotype/groupset and the 8 top associated modules.

  6. Generate output tables

    • networkDatOutput = data.frame(featuredata, moduleColors, GS.datTraits, GSPvalue, geneGroupsetCor, geneGroupsetPvalue)

    • networkDatOutput_incl_MM = data.frame(networkDatOutput, datKME[,modOrder], MMPvalue[,modOrder])

  7. Visualization of networks within R (TOMplot, MDSplot).

Author(s)

Frank Ruehle

References

https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.