ORT: Create a Online Random Tree Object

Description Usage Arguments Format Details Value Fields Methods Author(s) Examples

Description

ORT is a class of R6 and it inherits the Tree class. You can use it to create a decision tree via diffrent ways, which supports incremental learning as well as batch learning.

Usage

1
ORT$new(param)

Arguments

param

A list which usually has names of minSamples, minGain, numClasses, x.rng, etc.. More details show in Fields.

Format

R6Class object.

Details

See details in description of each field or method.

Value

Object of R6Class, Object of Online Random Tree.

Fields

age

How many times has the loop go through inside the update() function.

minSamples

A part of param indicates the minimal samples in a leaf node. For classification, lower, for regression, higher.

minGain

A part of param indicates minimal entropy gain when split a node. For classification, lower, for regression, higher.

numTests

A part of param indicates the number of SuffStats in tests. Default 10 if not set.

maxDepth

A part of param indicates max depth of an ORT tree. Default 10 if not set.

numClasses

A nonnegative integer indicates how many classes when solve a classifation problem. Default 0 for regression. If numClasses > 0, then do classifation.

classValues

All diffrent possible values of y if classification. Default NULL if not set.

x.rng

A data frame which indicates the range of every x variable in training data. It must be a shape of n*2 which n is the number of x variables, i.e. x.dim. And the first collumn must be the minimal values of x and the second as maximum. You can generate it via OnlineRandomForest::dataRange() for convenience.

...

Other fields can be seen in Tree.

Methods

findLeaf(x, tree, depth = 0)

Find the leaf node where x is located. Return a list, including node and its depth.

  • x - A sample of x.

  • tree - An ORT tree or node.

gains(elem)

Compute the entropy gain on all tests of the elem.

  • elem - An Elem object.

update(x, y)

When a sample comes in current node, update ORT with the sample's x variables and y value.

  • x - The x variables of a sample. Note it is an numeric vector other than a scalar.

  • y - The y value of a sample.

generateTree(tree.mat, df.node, node.ind = 1)

Generate a Tree from a tree matrix which just likes the result of randomForest::getTree().

  • tree.mat - A tree matrix which can be obtained from randomForest::getTree(). Node that it must have a column named node.ind. See Examples.

  • node.ind - The index of the current node in Tree. Default 1 for the root node. For most purposes, don't need to change it.

  • df.node - The training data frame which has been used to contruct randomForest, i.e., the data argument in randomForest function. Note that all columns in df.node must be numeric ors integer.

predict(x)

Predict the corresponding y value of x.

  • x - The x variables of a sample. Note it is an numeric vector other than a scalar.

draw()

Draw the Tree.

...

Other methods can be seen in Tree.

Author(s)

Quan Gu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
if(!require(randomForest)) install.packages("randomForest")
library(randomForest)

# classifaction example
dat1 <- iris; dat1[,5] <- as.integer(dat1[,5])
rf <- randomForest(factor(Species) ~ ., data = dat1)
treemat1 <- getTree(rf, 1, labelVar=F)
treemat1 <- cbind(treemat1, node.ind = 1:nrow(treemat1))
x.rng1 <- data.frame(min = apply(dat1[1:4], 2, min), 
                     max = apply(dat1[1:4], 2, max), 
                     row.names = paste0("X",1:4)) # or use dataRange(dat1[1:4])
param1 <- list('minSamples'= 5, 'minGain'= 0.1, 'numClasses'= 3, 'x.rng'= x.rng1)
ort1 <- ORT$new(param1)
ort1$generateTree(treemat1, df.node = dat1) # 23ms, 838KB
ort1$draw()
ort1$left$elem$tests[[1]]$statsL
sapply(1:150, function(i) ort1$predict(dat1[i,1:4]))

# first generate, then update
ind.gen <- sample(1:150,50) # for generate ORT
ind.updt <- setdiff(1:150, ind.gen) # for update ORT
rf2 <- randomForest(factor(Species) ~ ., data = dat1[ind.gen,])
treemat2 <- getTree(rf2, 22, labelVar=F)
treemat2 <- cbind(treemat2, node.ind = 1:nrow(treemat2))
ort2 <- ORT$new(param1)
ort2$draw()
ort2$generateTree(treemat2, df.node = dat1[ind.gen,])
ort2$draw()
for(i in ind.updt) {
  ort2$update(dat1[i,1:4], dat1[i,5])
}
ort2$draw()


# regression example
if(!require(ggplot2)) install.packages("ggplot2")
data("diamonds", package = "ggplot2")
dat3 <- as.data.frame(diamonds[sample(1:53000,1000), c(1:6,8:10,7)])
for (col in c("cut","color","clarity")) dat3[[col]] <- as.integer(dat3[[col]]) # Don't forget !
x.rng3 <- data.frame(min = apply(dat3[1:9], 2, min),
                     max = apply(dat3[1:9], 2, max),
                     row.names = paste0("X", 1:9))
param3 <- list('minSamples'= 10, 'minGain'= 1, 'maxDepth' = 10, 'x.rng'= x.rng3)
ind.gen3 <- sample(1:1000,500)
ind.updt3 <- setdiff(1:1000, ind.gen3)
rf3 <- randomForest(price ~ ., data = dat3[ind.gen3,], maxnodes = 20)
treemat3 <- getTree(rf3, 33, labelVar = F)
treemat3 <- cbind(treemat3, node.ind = 1:nrow(treemat3))

ort3 <- ORT$new(param3)
ort3$generateTree(treemat3, df.node = dat3[ind.gen3,])
ort3$size()
for (i in ind.updt3) {
  ort3$update(dat3[i,1:9], dat3[i,10])
}
ort3$size()

ZJUguquan/OnlineRandomForest documentation built on May 20, 2019, 2:57 p.m.