make.decision.tree: Creates a decision tree to classify samples using the...

Description Usage Arguments Details Value Note See Also Examples

View source: R/make.decision.tree.R

Description

A decision tree in Pigengene-package uses module eigengenes to build a classifier that distinguishes the different classes. Briefly, each eigengene is a weighted average of the expression of all genes in the module, where the weight of each gene corresponds to its membership in the module.

Usage

1
2
3
4
5
6
7
make.decision.tree(pigengene, Data, 
  Labels = structure(pigengene$annotation[rownames(pigengene$eigengenes),
          1], names = rownames(pigengene$eigengenes)),
  testD = NULL, testL = NULL, selectedFeatures = NULL,
  saveDir = "C5Trees", minPerLeaf = NULL, useMod0 = FALSE, 
  costRatio = 1, toCompact = NULL, noise = 0, noiseRepNum = 10, doHeat=TRUE,
  verbose = 0, naTolerance=0.05)

Arguments

pigengene

The pigengene object that is used to build the decision tree. See pigengene-class.

Data

The training expression data

Labels

Labels (condition types) for the (training) expression data. It is a named vector of characters. Data and pigengene will be subset according to these names.

testD

The test expression data, for example, from an independent dataset. Optional.

testL

Labels (condition types) for the (test) expression data. Optional.

selectedFeatures

A numeric vector determining the subset of eigengenes that should be used as potential predictors. By default ("All"), eigengenes for all modules are considered. See also useMod0.

saveDir

Where to save the plots of the tree(s).

minPerLeaf

Vector of integers. For each value, a tree will be built requiring at least that many nodes on each leaf. By default (NULL), several trees are built, one for each possible value between 2 and 10 percent of the number of samples.

useMod0

Boolean. Wether to allow the tree(s) to use the eigengene of module 0, which corresponds to the set of outlier, as a proper predictor.

costRatio

A numeric value effective only for 2 groups classification. The default value (1) considers the misclassification of both conditions as equally disadvatageous. Change this value to a larger or smaller value if you are more interested in the specificity of predictions for condition 1 or condition 2, respectively.

toCompact

An integer. The tree with this minPerLeaf value will be compacted (shrunk). Compacting in this context means reducing the number of required genes for the calculation of the relevant eigengenes and making the predictions using the tree. If NULL (default), the (persumably) most general proper tree (corresponding to the largest value in the minPerLeaf vector for which a tree could be constructed) is compacted. Set to FALSE to turn off compacting.

noise, noiseRepNum

For development purposes only. These parameters allow investigating the effect of gaussian noise in the expression data on the accurracy of the tree for test data.

doHeat

Boolean. Set to FALSE not to plot the heatmaps for faster comoutation.

verbose

The integer level of verbosity. 0 means silent and higher values produce more details of computation.

naTolerance

Upper threshold on the fraction of entries per gene that can be missing. Genes with a larger fraction of missing entries are ignored. For genes with smaller fraction of NA entries, the missing values are imputed from their average expression in the other samples. See check.pigengene.input.

Details

This function passes the inut eigengenes and appropriate arguments C5.0 function from C50 package.

Value

A list with following elements:

call

The call that created the results

c5Trees

A list, with one element of class C5.0 for each attempted minNodesperleaf value. The list is named with the corresponding values as characters.

minPerLeaf

A numeric vector enumerating all of the attempted minPerLeaf values.

compacted

The full output of compact.tree function if toCompact is not FALSE

heat

The output of module.heatmap function for the full tree if doHeat is not FALSE

heatCompact

The output of module.heatmap function for the compacted tree if toCompact is not FALSE

noisy

The full output of noise.analysiy function if noise is not 0. For development and evaluation purposes only.

leafLocs

A matrix reporting the leaf for each sample on 1 row. The columns are named according to the correspoding minNodesperleaf value.

toCompact

Echos the toCompact input argument

costs

The cost matrix

saveDir

The directory where plots are saved in

Note

For faster computation in an initial, explanatory run, turn off compacting, which can take a few minutes, with toCompact=FALSE.

See Also

Pigengene-package, compute.pigengene, compact.tree, C5.0, Pigengene-package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
     ## Data:
     data(aml)
     data(mds)
     data(pigengene)
     d1 <- rbind(aml,mds)

     ## Fiting the trees:
     trees <- make.decision.tree(pigengene=pigengene, Data=d1,
       saveDir="trees", minPerLeaf=14:15, doHeat=FALSE,verbose=3,
       toCompact=15)

Pigengene documentation built on Nov. 8, 2020, 6:47 p.m.