one.step.pigengene: Runs the entire Pigengene pipeline

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/one.step.pigengene.R

Description

Runs the entire Pigengene pipeline, from gene expression to compact decision trees in a single function. It identifies the gene modules using coexpression network analysis, computes eigengenes, learns a Bayesian network, fits decision trees, and compact them.

Usage

1
2
3
4
one.step.pigengene(Data, saveDir = "Pigengene", Labels, testD = NULL, 
  testLabels = NULL, doBalance = TRUE, RsquaredCut=0.8, costRatio = 1, toCompact = FALSE, bnNum = 0,
  bnArgs = NULL, useMod0 = FALSE, mit = "All", verbose = 0, doHeat = TRUE, 
  seed = NULL, dOrderByW = TRUE, naTolerance=0.05)

Arguments

Data

A matrix or data frame (or list of matrices or data frames) containing the training expression data, with genes corresponding to columns and rows corresponding to samples. Rows and columns must be named. For example, from RNA-Seq data, log(RPKM+1) can be used.

Labels

A (preferably named) vector containing the Labels (condition types) for the training Data. Or, if Data is a list, a list of label vectors corresponding to the data sets in Data. Names must agree with rows of Data.

saveDir

Directory to save the results.

testD

Test expression data with syntax similar to Data, possibly with different rows and columns. This argument is optional and can be set to NULL if test data are not available.

testLabels

A (preferably named) vector containing the Labels (condition types) for the test Data. This argument is optional and can be set to NULL if test data are not available.

doBalance

Boolean. Whether the data should be oversampled before identifying the modules so that each condition contribute roughly the same number of samples to clustering.

RsquaredCut

A threshold in the range [0,1] used to estimate the power. A higher value can increase power. For technical use only. See pickSoftThreshold for more details. A larger value generally leads to more modules.

costRatio

A numeric value, the relative cost of misclassifying a sample from the first condition vs. misclassifying a sample from the second condition.

toCompact

An integer value determining which decision tree to shrink. It is the minimum number of genes per leaf imposed when fitting the tree. Set to FALSE to skip compacting, to NULL to automatically select the maximum value.

bnNum

Desired number of bootstraped Baysian networks. Set to 0 to skip BN learning.

bnArgs

A list of arguments passed to learn.bn function.

useMod0

Boolean, whether to allow module zero (the set of outliers) to be used as a predictor in the decision tree(s).

mit

The "module identification type", a character vector determining the reference conditions for clustering. If 'All' (default), clustering is performed using the entire data regardless of condition.

verbose

The integer level of verbosity. 0 means silent and higher values produce more details of computation.

doHeat

If TRUE the heatmap of expression of genes in the modules that contribute to the the tree will be plotted.

seed

Random seed to ensure reproducibility.

dOrderByW

If TRUE, the genes will be ordered in the csv file based on their absolute weight in the corresponding module.

naTolerance

Upper threshold on the fraction of entries per gene that can be missing. Genes with a larger fraction of missing entries are ignored. For genes with smaller fraction of NA entries, the missing values are imputed from their average expression in the other samples. See check.pigengene.input.

Details

This is the main function of the package Pigengene and performs several steps: First, modules are identified in the training expression data, according to mit argument i.e. based on coexpression behaviour in the corresponding conditions. Set it to "All" to use all training data for this step regardless of the condition. Then, if a list of data frames is provided in Data, similarity networks on the data sets are computed and combined into one similarity network for the union of nodes across data sets. Then, the eigengenes for each module and each sample are calculated, where the expression of an eigengene of a module in a sample is the weighted average of the expression of the genes in that module in the sample. Technically, an eigengene is the first principal component of the gene expression in a module. PCA ensures that the maximum variance accross all the training samples is explained by the eigengene. Next, (optionally –if bnNum is set to a value greater than 0), several bootstrapped Bayesian networks are learned and combined into a consensus network, in order to detect and illustrate the probabilistic dependencies between the eigengenes and the disease subtype. Next, decisision tree(s) are built that use the module eigengenes, or a subset of them, to distinguish the classes (Labels). The accurracy of trees is assessed on the train and (if provided) test data. Finally, the number of required genes for the calculation of the relevant eigengenes is reduced (the tree is 'compacted'). The accuracy of the tree is reassessed after removal of each gene. Along the way, several self explanatory directories, heatmaps and plots are created and stored under saveDir.

Value

A list with the following components:

call

The call that created the results.

wgRes

A list. The results of WGCNA clustering of the Data by wgcna.one.step.

betaRes

A list. The automatically selected beta (power) parameter which was used for the WGCNA clustering. It is the result of the call to calculate.beta using the expression data of mit conditions(s).

pigengene

The pigengene object computed for the clusters, result of compute.pigengene.

leanrtBn

A list. The results of learn.bn call for learning a Bayesian network using the eigengenes.

selectedFeatures

A vector of the names of module eigengenes that were considered during the construction of decision trees. If bnNum >0, this corresponds to the immediate neighbors of the Disease or Effect variable in the consensus network.

c5treeRes

A list. The results of make.decision.tree call for learning decision trees that use the eigengenes as features.

Note

The individual functions are exported to facilitated running the pipeline step-by-step in a customized way.

Author(s)

Amir Foroushani, Habil Zare, and Rupesh Agrahari

References

Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia, Foroushani A, Agrahari R, Docking R, Karsan A, and Zare H. In preparation.

See Also

check.pigengene.input, balance, calculate.beta, wgcna.one.step, compute.pigengene, learn.bn, make.decision.tree, blockwiseModules

Examples

1
2
3
4
5
6
7
8
data(aml)
data(mds)
d1 <- rbind(aml,mds)
Labels <- c(rep("AML",nrow(aml)),rep("MDS",nrow(mds)))
names(Labels) <- rownames(d1)
p1 <- one.step.pigengene(Data=d1,saveDir=".", bnNum=10, verbose=1, seed=1, 
      Labels=Labels, toCompact=FALSE, doHeat=FALSE)
plot(p1$c5treeRes$c5Trees[["34"]])

Pigengene documentation built on Nov. 8, 2020, 6:47 p.m.