ODBT | R Documentation |
We use ODT as the basic tree model (base learner). To improve the performance of a boosting tree, we apply the feature bagging in this process, in the same
way as the random forest. Our final estimator is called the ensemble of ODT-based boosting trees, denoted by ODBT
, is the average of many boosting trees.
ODBT(X, ...)
## S3 method for class 'formula'
ODBT(
formula,
data = NULL,
Xnew = NULL,
type = "auto",
model = c("ODT", "rpart", "rpart.cpp")[1],
TreeRotate = TRUE,
max.terms = 30,
NodeRotateFun = "RotMatRF",
FunDir = getwd(),
paramList = NULL,
ntrees = 100,
storeOOB = TRUE,
replacement = TRUE,
stratify = TRUE,
ratOOB = 0.368,
parallel = TRUE,
numCores = Inf,
MaxDepth = Inf,
numNode = Inf,
MinLeaf = ceiling(sqrt(ifelse(replacement, 1, 1 - ratOOB) * ifelse(is.null(data),
length(eval(formula[[2]])), nrow(data)))/3),
subset = NULL,
weights = NULL,
na.action = na.fail,
catLabel = NULL,
Xcat = 0,
Xscale = "No",
...
)
## Default S3 method:
ODBT(
X,
y,
Xnew = NULL,
type = "auto",
model = c("ODT", "rpart", "rpart.cpp")[1],
TreeRotate = TRUE,
max.terms = 30,
NodeRotateFun = "RotMatRF",
FunDir = getwd(),
paramList = NULL,
ntrees = 100,
storeOOB = TRUE,
replacement = TRUE,
stratify = TRUE,
ratOOB = 0.368,
parallel = TRUE,
numCores = Inf,
MaxDepth = Inf,
numNode = Inf,
MinLeaf = ceiling(sqrt(ifelse(replacement, 1, 1 - ratOOB) * length(y))/3),
subset = NULL,
weights = NULL,
na.action = na.fail,
catLabel = NULL,
Xcat = 0,
Xscale = "No",
...
)
X |
An n by d numeric matrix (preferable) or data frame. |
... |
Optional parameters to be passed to the low level function. |
formula |
Object of class |
data |
Training data of class |
Xnew |
An n by d numeric matrix (preferable) or data frame containing predictors for the new data. |
type |
Use |
model |
The basic tree model for boosting. We offer three options: "ODT" (default), "rpart" and "rpart.cpp" (improved "rpart"). |
TreeRotate |
If or not to rotate the training data with the rotation matrix estimated by logistic regression before building the tree (default TRUE). |
max.terms |
The maximum number of iterations for boosting trees. |
NodeRotateFun |
Name of the function of class
|
FunDir |
The path to the |
paramList |
List of parameters used by the functions |
ntrees |
The number of trees in the forest (default 100). |
storeOOB |
If TRUE then the samples omitted during the creation of a tree are stored as part of the tree (default TRUE). |
replacement |
if TRUE then n samples are chosen, with replacement, from training data (default TRUE). |
stratify |
If TRUE then class sample proportions are maintained during the random sampling. Ignored if replacement = FALSE (default TRUE). |
ratOOB |
Ratio of 'out-of-bag' (default 1/3). |
parallel |
Parallel computing or not (default TRUE). |
numCores |
Number of cores to be used for parallel computing (default |
MaxDepth |
The maximum depth of the tree (default |
numNode |
Number of nodes that can be used by the tree (default |
MinLeaf |
Minimal node size (Default 5). |
subset |
An index vector indicating which rows should be used. (NOTE: If given, this argument must be named.) |
weights |
Vector of non-negative observational weights; fractional weights are allowed (default NULL). |
na.action |
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.) |
catLabel |
A category labels of class |
Xcat |
A class |
Xscale |
Predictor standardization methods. " Min-max" (default), "Quantile", "No" denote Min-max transformation, Quantile transformation and No transformation respectively. |
y |
A response vector of length n. |
An object of class ODBT Containing a list components:
call
: The original call to ODBT.
terms
: An object of class c("terms", "formula")
(see terms.object
) summarizing the formula. Used by various methods, but typically not of direct relevance to users.
ppTrees
: Each tree used to build the forest.
oobErr
: 'out-of-bag' error for tree, misclassification rate (MR) for classification or mean square error (MSE) for regression.
oobIndex
: Which training data to use as 'out-of-bag'.
oobPred
: Predicted value for 'out-of-bag'.
other
: For other tree related values ODT
.
oobErr
: 'out-of-bag' error for forest, misclassification rate (MR) for classification or mean square error (MSE) for regression.
oobConfusionMat
: 'out-of-bag' confusion matrix for forest.
split
, Levels
and NodeRotateFun
are important parameters for building the tree.
paramList
: Parameters in a named list to be used by NodeRotateFun
.
data
: The list of data related parameters used to build the forest.
tree
: The list of tree related parameters used to build the tree.
forest
: The list of forest related parameters used to build the forest.
results
: The prediction results for new data Xnew
using ODBT
.
Yu Liu and Yingcun Xia
Zhan, H., Liu, Y., & Xia, Y. (2024). Consistency of Oblique Decision Tree and its Boosting and Random Forest. arXiv preprint arXiv:2211.12653.
Tomita, T. M., Browne, J., Shen, C., Chung, J., Patsolic, J. L., Falk, B., ... & Vogelstein, J. T. (2020). Sparse projection oblique randomer forests. Journal of machine learning research, 21(104).
ODT
best.cut.node
# Classification with Oblique Decision Tree.
data(seeds)
set.seed(221212)
train <- sample(1:209, 100)
train_data <- data.frame(seeds[train, ])
test_data <- data.frame(seeds[-train, ])
forest <- ODBT(varieties_of_wheat ~ ., train_data, test_data[, -8],
model = "rpart",
type = "class", parallel = FALSE, NodeRotateFun = "RotMatRF"
)
pred <- forest$results$prediction
# classification error
(mean(pred != test_data[, 8]))
forest <- ODBT(varieties_of_wheat ~ ., train_data, test_data[, -8],
model = "rpart.cpp",
type = "class", parallel = FALSE, NodeRotateFun = "RotMatRF"
)
pred <- forest$results$prediction
# classification error
(mean(pred != test_data[, 8]))
# Regression with Oblique Decision Randome Forest.
data(body_fat)
set.seed(221212)
train <- sample(1:252, 80)
train_data <- data.frame(body_fat[train, ])
test_data <- data.frame(body_fat[-train, ])
# To use ODT as the basic tree model for boosting, you need to set
# the parameters model = "ODT" and NodeRotateFun = "RotMatPPO".
forest <- ODBT(Density ~ ., train_data, test_data[, -1],
type = "reg", parallel = FALSE, model = "ODT",
NodeRotateFun = "RotMatPPO"
)
pred <- forest$results$prediction
# estimation error
mean((pred - test_data[, 1])^2)
forest <- ODBT(Density ~ ., train_data, test_data[, -1],
type = "reg", parallel = FALSE, model = "rpart.cpp",
NodeRotateFun = "RotMatRF"
)
pred <- forest$results$prediction
# estimation error
mean((pred - test_data[, 1])^2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.