Description Usage Arguments Details Value Examples
This function attempts to predict from Complete-Random Tree Forests using xgboost. Predictions are deferred to CRTreeForest_pred_internals
.
1 2 3 |
model |
Type: list. A model trained by |
data |
Type: data.table. A data to predict on. If passing training data, it will predict as if it was out of fold and you will overfit (so, use the list |
folds |
Type: list. The folds as list for cross-validation if using the training data. Otherwise, leave |
prediction |
Type: logical. Whether the predictions of the forest ensemble are averaged. Set it to |
multi_class |
Type: numeric. How many classes you got. Set to 2 for binary classification, or regression cases. Set to |
data_start |
Type: vector of numeric. The initial prediction labels. Set to |
return_list |
Type: logical. Whether lists should be returned instead of concatenated frames for predictions. Defaults to |
work_dir |
Type: character, without slash at end (ex: "dev/tools/save_in_this_folder"). The working directory where models are stored, if using external model files as memory. Defaults to |
For implementation details of Cascade Forest / Complete-Random Tree Forest / Multi-Grained Scanning / Deep Forest, check this: https://github.com/Microsoft/LightGBM/issues/331#issuecomment-283942390 by Laurae.
A data.table or a list based on data
predicted using model
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ## Not run:
# Load libraries
library(data.table)
library(Matrix)
library(xgboost)
# Create data
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
agaricus_data_train <- data.table(as.matrix(agaricus.train$data))
agaricus_data_test <- data.table(as.matrix(agaricus.test$data))
agaricus_label_train <- agaricus.train$label
agaricus_label_test <- agaricus.test$label
folds <- Laurae::kfold(agaricus_label_train, 5)
# Train a model (binary classification)
model <- CRTreeForest(training_data = agaricus_data_train, # Training data
validation_data = agaricus_data_test, # Validation data
training_labels = agaricus_label_train, # Training labels
validation_labels = agaricus_label_test, # Validation labels
folds = folds, # Folds for cross-validation
nthread = 1, # Change this to use more threads
lr = 1, # Do not touch this unless you are expert
training_start = NULL, # Do not touch this unless you are expert
validation_start = NULL, # Do not touch this unless you are expert
n_forest = 5, # Number of forest models
n_trees = 10, # Number of trees per forest
random_forest = 2, # We want only 2 random forest
seed = 0,
objective = "binary:logistic",
eval_metric = Laurae::df_logloss,
return_list = TRUE, # Set this to FALSE for a data.table output
multi_class = 2, # Modify this for multiclass problems
verbose = " ")
# Predict from model
new_preds <- CRTreeForest_pred(model, agaricus_data_test, return_list = FALSE)
# We can check whether we have equal predictions, it's all TRUE!
all.equal(model$train_preds, CRTreeForest_pred(model, agaricus_data_train, folds = folds))
all.equal(model$valid_preds, CRTreeForest_pred(model, agaricus_data_test))
all.equal(model$train_means, CRTreeForest_pred(model,
agaricus_data_train,
folds = folds,
return_list = FALSE,
prediction = TRUE))
all.equal(model$valid_means, CRTreeForest_pred(model,
agaricus_data_test,
return_list = FALSE,
prediction = TRUE))
# Attempt to perform fake multiclass problem
agaricus_label_train[1:100] <- 2
# Train a model (multiclass classification)
model <- CRTreeForest(training_data = agaricus_data_train, # Training data
validation_data = agaricus_data_test, # Validation data
training_labels = agaricus_label_train, # Training labels
validation_labels = agaricus_label_test, # Validation labels
folds = folds, # Folds for cross-validation
nthread = 1, # Change this to use more threads
lr = 1, # Do not touch this unless you are expert
training_start = NULL, # Do not touch this unless you are expert
validation_start = NULL, # Do not touch this unless you are expert
n_forest = 5, # Number of forest models
n_trees = 10, # Number of trees per forest
random_forest = 2, # We want only 2 random forest
seed = 0,
objective = "multi:softprob",
eval_metric = Laurae::df_logloss,
return_list = TRUE, # Set this to FALSE for a data.table output
multi_class = 3, # Modify this for multiclass problems
verbose = " ")
# Predict from model for mutliclass problems
new_preds <- CRTreeForest_pred(model, agaricus_data_test, return_list = FALSE)
# We can check whether we have equal predictions, it's all TRUE!
all.equal(model$train_preds, CRTreeForest_pred(model, agaricus_data_train, folds = folds))
all.equal(model$valid_preds, CRTreeForest_pred(model, agaricus_data_test))
all.equal(model$train_means, CRTreeForest_pred(model,
agaricus_data_train,
folds = folds,
return_list = FALSE,
prediction = TRUE))
all.equal(model$valid_means, CRTreeForest_pred(model,
agaricus_data_test,
return_list = FALSE,
prediction = TRUE))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.