WGCNA_onestep: WGCNA onestep

Description Usage Arguments Value Examples

View source: R/WGCNA.R

Description

WGCNA onestep

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
WGCNA_onestep(
  exprMat,
  traitData = NULL,
  categoricalTrait = NULL,
  prefix = "ehbio",
  corType = "bicor",
  networkType = "signed",
  maxPower = NULL,
  maxBlockSize = NULL,
  top_mad_n = 0.75,
  rmVarZero = T,
  minimal_mad = NULL,
  thresholdZ.k = -2.5,
  TOM_plot = NULL,
  top_hub_n = 20,
  removeOutlier = F,
  RsquaredCut = 0.85,
  minModuleSize = NULL,
  mergeCutHeight = 0.2,
  numericLabels = TRUE,
  pamRespectsDendro = FALSE,
  saveTOMs = TRUE,
  maxPOutliers = NULL,
  loadTOM = TRUE,
  TOMDenom = "min",
  deepSplit = 1,
  stabilityCriterion = "Individual fraction",
  verbose = 0,
  os_system = NULL,
  randomSeed = 11521,
  dynamicCutPlot = TRUE,
  power_min = NULL,
  up_color = c("red", "white", "blue"),
  down_color = c("green", "white"),
  ...
)

Arguments

exprMat

Gene expression matrix in format as "Genes x Samples". The first column (gene names) must be unique among all rows and will be treated as rownames. The first row (sample names) must be unique among all columns and will be treated as colnames. Columns should be separted by "TAB".

The expression data can be log transformed FPKM/TPM/CPM, vst or rlog transformed value.

ID Samp1 Samp2 ... SampX
Gene1  1.5 2.0 ... 10
Gene2  1.2 4.0 ... 10
.
.
.
Gene3  2.5 2.0 ... 8
traitData

Sample attribte data with first column as sample names and other columns as sample attributes. Specifically for categorical attributes, each attribute one column, 0 represents not belong to while 1 represents belonging to. Or one can give categorical attributes separately to "categoricalTrait".

ID      WT      KO      OE Height Weight Diameter
samp1   1       0       0       1       2       3
samp2   1       0       0       2       4       6
samp3   0       1       0       10      20      50
samp4   0       1       0       15      30      80
samp5   0       0       1       NA      9       8
samp6   0       0       1       4       8       7

categoricalTrait

Categorical attributes file with format described below. The program will transferred it to 0-1 matrix like them in "traitData". One can give only traitData or categoricalTrait or both (the program will bind them together).

ID group family
samp1 WT A
samp2 WT B
samp3 KO A
samp4 KO B
samp5 OE A
samp6 OE B
prefix

prefix for output files.

corType

character string specifying the correlation to be used. Allowed values are (unique abbreviations of) "pearson" and "bicor", corresponding to Pearson and bidweight midcorrelation, respectively. Missing values are handled using the pairwise.complete.obs option.

networkType

Default "signed". Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". Correlation and distance are transformed as follows:

  1. for type = "unsigned", adjacency = |cor|^power;

  2. for type = "signed", adjacency = (0.5 * (1+cor) )^power;

  3. for type = "signed hybrid", adjacency = cor^power if cor>0 and 0 otherwise;

and for type = "distance", adjacency = (1-(dist/max(dist))^2)^power.

maxPower

Specify maximum power to check. Default 30 for "unsigned" network and 40 for other type. Any number less than 20 would be treated as 20.

maxBlockSize

integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.

top_mad_n

An integer larger than 1 will be used to get top x genes (like top 5000). A float number less than 1 will be used to get top x fraction genes (like top 0.7 of all genes).

rmVarZero

Default TRUE. Remove genes with variance as 0. Normally for PCA or correlation analysis.

minimal_mad

Minimal allowed mad value.

thresholdZ.k

Threshold for defining outliers. First compute the overall corelation of one sample to other samples. Then do Z-score transfer for all correlation values. The samples with corelation values less than given value would be treated as outliers. Default -2.5 meaning -2.5 std.

TOM_plot

Get TOM plot and save to file given here like 'tomplot.pdf'.

removeOutlier

Remove outlier samples. Normally this should be only performed if no suitable soft power can be found.

RsquaredCut

R2 for defining scale-free network (default 0.85). Any number larger than 1 would be treated as 0.99.

minModuleSize

minimum module size for module detection. See cutreeDynamic for more details.

mergeCutHeight

dendrogram cut height for module merging.

numericLabels

logical: should the returned modules be labeled by colors (FALSE), or by numbers (TRUE)?

pamRespectsDendro

Logical, only used when pamStage is TRUE. If TRUE, the PAM stage will respect the dendrogram in the sense an object can be PAM-assigned only to clusters that lie below it on the branch that the object is merged into. See cutreeDynamic for more details.

saveTOMs

logical: should the consensus topological overlap matrices for each block be saved and returned?

maxPOutliers

only used for corType=="bicor". Specifies the maximum percentile of data that can be considered outliers on either side of the median separately. For each side of the median, if higher percentile than maxPOutliers is considered an outlier by the weight function based on 9*mad(x), the width of the weight function is increased such that the percentile of outliers on that side of the median equals maxPOutliers. Using maxPOutliers=1 will effectively disable all weight function broadening; using maxPOutliers=0 will give results that are quite similar (but not equal to) Pearson correlation.

loadTOM

logical: should Topological Overlap Matrices be loaded from previously saved files (TRUE) or calculated (FALSE)? It may be useful to load previously saved TOM matrices if these have been calculated previously, since TOM calculation is often the most computationally expensive part of network construction and module identification. See saveTOMs and saveTOMFileBase below for when and how TOM files are saved, and what the file names are. If loadTOM is TRUE but the files cannot be found, or do not contain the correct TOM data, TOM will be recalculated.

TOMDenom

a character string specifying the TOM variant to be used. Recognized values are "min" giving the standard TOM described in Zhang and Horvath (2005), and "mean" in which the min function in the denominator is replaced by mean. The "mean" may produce better results but at this time should be considered experimental.

deepSplit

integer value between 0 and 4. Provides a simplified control over how sensitive module detection should be to module splitting, with 0 least and 4 most sensitive. See cutreeDynamic for more details.

stabilityCriterion

One of c("Individual fraction", "Common fraction"), indicating which method for assessing stability similarity of two branches should be used. We recommend "Individual fraction" which appears to perform better; the "Common fraction" method is provided for backward compatibility since it was the (only) method available prior to WGCNA version 1.60.

verbose

integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.

os_system

Default the program will detect system type to choose which multiple thread function will be used. enableWGCNAThreads is recommended, but only work in some linux os. allowWGCNAThreads is not recommended if enableWGCNAThreads works. However for max and windows os, this is the only one can be used. Even for some linux system, using enableWGCNAThreads will make programs stuck in pickSoftThreshold step. So if stucked, supply any string other than linux to enable the usages of allowWGCNAThreads.

randomSeed

integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.

dynamicCutPlot

Plot merged modules as well as dynamic cutted modules before merge.

power_min

For some data type, default selected power is a small number. Mostly this is due to unnormalized expression value, batch effects or small amount of total samples. When this happens, we may want to assign a power as 6 or other common numbers for downstream analysis. Here is where to specify it. Be careful to use this parameter unless you know what you are doing.

up_color

Vector of colours to use for upper triangles (which representing pearson correlations values).

down_color

Vector of colours to use for lower triangles (which representing significance p-values).

...

Other parameters given to read.table.

Value

net

Examples

1
2
3
4
5
exprMat <- "test.file"

traitData <- 'trait.file'

WGCNA_onestep(exprMat, traitData)

Tong-Chen/YSX documentation built on Jan. 25, 2021, 2:49 a.m.