Description Usage Arguments Format Details Value Fields Methods Author(s) Examples
ORT is a class of R6 and it inherits the Tree
class.
You can use it to create a decision tree via diffrent ways, which supports incremental learning as well as batch learning.
1 |
param |
A list which usually has names of |
R6Class
object.
See details in description of each field or method.
Object of R6Class
, Object of Online Random Tree
.
age
How many times has the loop go through inside the update()
function.
minSamples
A part of param
indicates the minimal samples in a leaf node. For classification, lower, for regression, higher.
minGain
A part of param
indicates minimal entropy gain when split a node. For classification, lower, for regression, higher.
numTests
A part of param
indicates the number of SuffStats
in tests. Default 10 if not set.
maxDepth
A part of param
indicates max depth of an ORT tree. Default 10 if not set.
numClasses
A nonnegative integer indicates how many classes when solve a classifation problem. Default 0 for regression. If numClasses > 0, then do classifation.
classValues
All diffrent possible values of y if classification. Default NULL if not set.
x.rng
A data frame which indicates the range of every x variable in training data.
It must be a shape of n*2
which n is the number of x variables, i.e. x.dim
.
And the first collumn must be the minimal values of x and the second as maximum.
You can generate it via OnlineRandomForest::dataRange()
for convenience.
...
Other fields can be seen in Tree
.
findLeaf(x, tree, depth = 0)
Find the leaf node where x is located. Return a list, including node and its depth.
x - A sample of x.
tree - An ORT tree or node.
gains(elem)
Compute the entropy gain on all tests of the elem.
elem - An Elem
object.
update(x, y)
When a sample comes in current node, update ORT with the sample's x variables and y value.
x - The x variables of a sample. Note it is an numeric vector other than a scalar.
y - The y value of a sample.
generateTree(tree.mat, df.node, node.ind = 1)
Generate a Tree from a tree matrix which just likes the result of randomForest::getTree()
.
tree.mat - A tree matrix which can be obtained from randomForest::getTree()
. Node that it must have a column named node.ind. See Examples.
node.ind - The index of the current node in Tree. Default 1
for the root node. For most purposes, don't need to change it.
df.node - The training data frame which has been used to contruct randomForest, i.e., the data argument in randomForest
function.
Note that all columns in df.node must be numeric ors integer.
predict(x)
Predict the corresponding y value of x.
x - The x variables of a sample. Note it is an numeric vector other than a scalar.
draw()
Draw the Tree.
...
Other methods can be seen in Tree
.
Quan Gu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | if(!require(randomForest)) install.packages("randomForest")
library(randomForest)
# classifaction example
dat1 <- iris; dat1[,5] <- as.integer(dat1[,5])
rf <- randomForest(factor(Species) ~ ., data = dat1)
treemat1 <- getTree(rf, 1, labelVar=F)
treemat1 <- cbind(treemat1, node.ind = 1:nrow(treemat1))
x.rng1 <- data.frame(min = apply(dat1[1:4], 2, min),
max = apply(dat1[1:4], 2, max),
row.names = paste0("X",1:4)) # or use dataRange(dat1[1:4])
param1 <- list('minSamples'= 5, 'minGain'= 0.1, 'numClasses'= 3, 'x.rng'= x.rng1)
ort1 <- ORT$new(param1)
ort1$generateTree(treemat1, df.node = dat1) # 23ms, 838KB
ort1$draw()
ort1$left$elem$tests[[1]]$statsL
sapply(1:150, function(i) ort1$predict(dat1[i,1:4]))
# first generate, then update
ind.gen <- sample(1:150,50) # for generate ORT
ind.updt <- setdiff(1:150, ind.gen) # for update ORT
rf2 <- randomForest(factor(Species) ~ ., data = dat1[ind.gen,])
treemat2 <- getTree(rf2, 22, labelVar=F)
treemat2 <- cbind(treemat2, node.ind = 1:nrow(treemat2))
ort2 <- ORT$new(param1)
ort2$draw()
ort2$generateTree(treemat2, df.node = dat1[ind.gen,])
ort2$draw()
for(i in ind.updt) {
ort2$update(dat1[i,1:4], dat1[i,5])
}
ort2$draw()
# regression example
if(!require(ggplot2)) install.packages("ggplot2")
data("diamonds", package = "ggplot2")
dat3 <- as.data.frame(diamonds[sample(1:53000,1000), c(1:6,8:10,7)])
for (col in c("cut","color","clarity")) dat3[[col]] <- as.integer(dat3[[col]]) # Don't forget !
x.rng3 <- data.frame(min = apply(dat3[1:9], 2, min),
max = apply(dat3[1:9], 2, max),
row.names = paste0("X", 1:9))
param3 <- list('minSamples'= 10, 'minGain'= 1, 'maxDepth' = 10, 'x.rng'= x.rng3)
ind.gen3 <- sample(1:1000,500)
ind.updt3 <- setdiff(1:1000, ind.gen3)
rf3 <- randomForest(price ~ ., data = dat3[ind.gen3,], maxnodes = 20)
treemat3 <- getTree(rf3, 33, labelVar = F)
treemat3 <- cbind(treemat3, node.ind = 1:nrow(treemat3))
ort3 <- ORT$new(param3)
ort3$generateTree(treemat3, df.node = dat3[ind.gen3,])
ort3$size()
for (i in ind.updt3) {
ort3$update(dat3[i,1:9], dat3[i,10])
}
ort3$size()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.