compact.tree: Reduces the number of genes in a decision tree

Description Usage Arguments Value References See Also Examples

View source: R/compact.tree.R

Description

In a greedy way, this function removes the genes with smaller weight one-by-one, while assessing the accuracy of the predictions of the resulting trees.

Usage

1
2
compact.tree(c5Tree, pigengene, Data=pigengene$Data, Labels=pigengene$Labels,
  testD=NULL, testL=NULL, saveDir=".", verbose=0)

Arguments

c5Tree

A decision tree of class C50 that uses module eigengenes, or NULL. If NULL, If NULL, expression plots for all modules are created.

pigengene

A object of pigengene-class, output of compute.pigengene

Data

A matrix or data frame containing the expression data, with genes corresponding to columns and rows corresponding to samples. Rows and columns must be named.

Labels

Labels (condition types) for the (training) expression data. It is a named vector of characters. Data will be subset according to these names.

testD

The test expression data, for example, from an independent dataset. Optional.

testL

Labels (condition types) for the (test) expression data. Optional.

saveDir

Where to save the plots of the tree(s)

verbose

Integer level of verbosity. 0 means silent and higher values produce more details of computation.

Value

A list with following elements is invisibly returned:

call

The call that created the results

predTrain

Prediction using projected data without compacting

predTrainCompact

Prediction after compacting

genes

A character vector of all genes in the full tree before compacting

genesCompacted

A character vector of all genes in the compacted tree

trainErrors

A matrix reporting errors on the train data. The rows are named according to the number of removed genes. Each column reports the number of misclassified samples in one condition (type) except the last column that reports the total.

testErrors

A matrix reporting errors on the test data similar to trainErrors

queue

A numeric vector named by all genes contributing to the full tree before compacting. The numeric values are weights increasingly ordered by absolute value.

pos

The number of removed genes

txtFile

Confusion matrices and other details on compacting are reported in this text file

References

Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia, Foroushani A, Agrahari R, Docking R, Karsan A, and Zare H. In preparation.

Gene shaving as a method for identifying distinct sets of genes with similar expression patterns, Hastie, Trevor, et al. Genome Biol 1.2 (2000): 1-0003.

See Also

Pigengene-package, compute.pigengene, make.decision.tree, C5.0, Pigengene-package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
     ## Data:
     data(aml)
     data(mds)
     data(pigengene)
     d1 <- rbind(aml,mds)

     ## Fiting the trees:
     trees <- make.decision.tree(pigengene=pigengene, Data=d1, 
     saveDir="trees", minPerLeaf=14:15, doHeat=FALSE,verbose=3,
     toCompact=FALSE)
     c1 <- compact.tree(c5Tree=trees$c5Trees[["15"]], pigengene=pigengene,
         saveDir="compacted", verbose=1)

Pigengene documentation built on Nov. 8, 2020, 6:47 p.m.