Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/one.step.pigengene.R
Runs the entire Pigengene pipeline, from gene expression to compact decision trees in a single function. It identifies the gene modules using coexpression network analysis, computes eigengenes, learns a Bayesian network, fits decision trees, and compact them.
1 2 3 4 |
Data |
A matrix or data frame (or list of matrices or data frames) containing the training expression data, with genes corresponding to columns and rows corresponding to samples. Rows and columns must be named. For example, from RNA-Seq data, log(RPKM+1) can be used. |
Labels |
A (preferably named) vector containing the Labels (condition types)
for the training Data. Or, if Data is a list, a list of label vectors
corresponding to the data sets in Data.
Names must agree with rows of |
saveDir |
Directory to save the results. |
testD |
Test expression data with syntax similar to |
testLabels |
A (preferably named) vector containing the Labels (condition types) for
the test Data. This argument is optional and can be set to |
doBalance |
Boolean. Whether the data should be oversampled before identifying the modules so that each condition contribute roughly the same number of samples to clustering. |
RsquaredCut |
A threshold in the range [0,1] used to estimate the power. A higher value
can increase power. For technical use only. See |
costRatio |
A numeric value, the relative cost of misclassifying a sample from the first condition vs. misclassifying a sample from the second condition. |
toCompact |
An integer value determining which decision tree to shrink.
It is the minimum number of genes per leaf imposed when fitting the tree.
Set to |
bnNum |
Desired number of bootstraped Baysian networks.
Set to |
bnArgs |
A list of arguments passed to |
useMod0 |
Boolean, whether to allow module zero (the set of outliers) to be used as a predictor in the decision tree(s). |
mit |
The "module identification type", a character vector determining the reference conditions for clustering. If 'All' (default), clustering is performed using the entire data regardless of condition. |
verbose |
The integer level of verbosity. 0 means silent and higher values produce more details of computation. |
doHeat |
If |
seed |
Random seed to ensure reproducibility. |
dOrderByW |
If |
naTolerance |
Upper threshold on the fraction of entries per gene that
can be missing. Genes with a larger fraction of missing
entries are ignored. For genes with smaller fraction of NA
entries, the missing values are imputed from their average
expression in the other samples.
See |
This is the main function of the package Pigengene and performs several
steps: First, modules are identified in the training expression data,
according to mit argument i.e. based on coexpression behaviour
in the corresponding conditions. Set it to "All" to use all training data for
this step regardless of the condition. Then, if a list of data frames is
provided in Data, similarity networks on the data sets are computed and
combined into one similarity network for the union of nodes across data
sets. Then, the eigengenes for each module
and each sample are calculated, where the expression of an eigengene of a
module in a sample is the weighted average of the expression of the genes in
that module in the sample. Technically, an eigengene is the first principal
component of the gene expression in a module. PCA ensures that the maximum
variance accross all the training samples is explained by the eigengene.
Next, (optionally –if bnNum is set to a value greater than 0),
several bootstrapped Bayesian networks are learned and combined into a
consensus network, in order to detect and illustrate the
probabilistic dependencies between the eigengenes and the disease subtype.
Next, decisision tree(s) are built that use the module eigengenes, or
a subset of them, to distinguish the classes (Labels
).
The accurracy of trees is assessed on the train and (if provided) test data.
Finally, the number of required genes for the calculation of the relevant
eigengenes is reduced (the tree is 'compacted'). The accuracy of the tree
is reassessed after removal of each gene. Along the way, several
self explanatory directories, heatmaps and plots are created and stored under
saveDir
.
A list with the following components:
call |
The call that created the results. |
wgRes |
A list. The results of WGCNA clustering of the Data by
|
betaRes |
A list. The automatically selected beta (power) parameter
which was used for the WGCNA clustering. It is the result of the call to
|
pigengene |
The pigengene object computed for the clusters, result
of |
leanrtBn |
A list. The results of |
selectedFeatures |
A vector of the names of module eigengenes that
were considered during the construction of decision trees.
If |
c5treeRes |
A list. The results of |
The individual functions are exported to facilitated running the pipeline step-by-step in a customized way.
Amir Foroushani, Habil Zare, and Rupesh Agrahari
Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia, Foroushani A, Agrahari R, Docking R, Karsan A, and Zare H. In preparation.
check.pigengene.input
,
balance
,
calculate.beta
,
wgcna.one.step
,
compute.pigengene
,
learn.bn
, make.decision.tree
,
blockwiseModules
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.