Description Usage Arguments Details Value Examples
This function attempts to predict from Cascade Forest using xgboost.
1 2 3 |
model |
Type: list. A model trained by |
data |
Type: data.table. A data to predict on. If passing training data, it will predict as if it was out of fold and you will overfit (so, use the list |
folds |
Type: list. The folds as list for cross-validation if using the training data. Otherwise, leave |
layer |
Type: numeric. The layer you want to predict on. If not provided ( |
prediction |
Type: logical. Whether the predictions of the forest ensemble are averaged. Set it to |
multi_class |
Type: numeric. How many classes you got. Set to 2 for binary classification, or regression cases. Set to |
data_start |
Type: vector of numeric. The initial prediction labels. Set to |
return_list |
Type: logical. Whether lists should be returned instead of concatenated frames for predictions. Defaults to |
low_memory |
Type: logical. Whether to perform the data.table transformations in place to lower memory usage. Defaults to |
For implementation details of Cascade Forest / Complete-Random Tree Forest / Multi-Grained Scanning / Deep Forest, check this: https://github.com/Microsoft/LightGBM/issues/331#issuecomment-283942390 by Laurae.
A data.table or a list based on data
predicted using model
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | ## Not run:
# Load libraries
library(data.table)
library(Matrix)
library(xgboost)
# Create data
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
agaricus_data_train <- data.table(as.matrix(agaricus.train$data))
agaricus_data_test <- data.table(as.matrix(agaricus.test$data))
agaricus_label_train <- agaricus.train$label
agaricus_label_test <- agaricus.test$label
folds <- Laurae::kfold(agaricus_label_train, 5)
# Train a model (binary classification)
model <- CascadeForest(training_data = agaricus_data_train, # Training data
validation_data = agaricus_data_test, # Validation data
training_labels = agaricus_label_train, # Training labels
validation_labels = agaricus_label_test, # Validation labels
folds = folds, # Folds for cross-validation
boosting = FALSE, # Do not touch this unless you are expert
nthread = 1, # Change this to use more threads
cascade_lr = 1, # Do not touch this unless you are expert
training_start = NULL, # Do not touch this unless you are expert
validation_start = NULL, # Do not touch this unless you are expert
cascade_forests = rep(4, 5), # Number of forest models
cascade_trees = 10, # Number of trees per forest
cascade_rf = 2, # Number of Random Forest in models
cascade_seeds = 0, # Seed per layer
objective = "binary:logistic",
eval_metric = Laurae::df_logloss,
multi_class = 2, # Modify this for multiclass problems
early_stopping = 2, # stop after 2 bad combos of forests
maximize = FALSE, # not a maximization task
verbose = TRUE, # print information during training
low_memory = FALSE)
# Predict from model
new_preds <- CascadeForest_pred(model, agaricus_data_test, prediction = FALSE)
# We can check whether we have equal predictions, it's all TRUE!
all.equal(model$train_means, CascadeForest_pred(model,
agaricus_data_train,
folds = folds))
all.equal(model$valid_means, CascadeForest_pred(model,
agaricus_data_test))
# Attempt to perform fake multiclass problem
agaricus_label_train[1:100] <- 2
# Train a model (multiclass classification)
model <- CascadeForest(training_data = agaricus_data_train, # Training data
validation_data = agaricus_data_test, # Validation data
training_labels = agaricus_label_train, # Training labels
validation_labels = agaricus_label_test, # Validation labels
folds = folds, # Folds for cross-validation
boosting = FALSE, # Do not touch this unless you are expert
nthread = 1, # Change this to use more threads
cascade_lr = 1, # Do not touch this unless you are expert
training_start = NULL, # Do not touch this unless you are expert
validation_start = NULL, # Do not touch this unless you are expert
cascade_forests = rep(4, 5), # Number of forest models
cascade_trees = 10, # Number of trees per forest
cascade_rf = 2, # Number of Random Forest in models
cascade_seeds = 0, # Seed per layer
objective = "multi:softprob",
eval_metric = Laurae::df_logloss,
multi_class = 3, # Modify this for multiclass problems
early_stopping = 2, # stop after 2 bad combos of forests
maximize = FALSE, # not a maximization task
verbose = TRUE, # print information during training
low_memory = FALSE)
# Predict from model for mutliclass problems
new_preds <- CascadeForest_pred(model, agaricus_data_test, prediction = FALSE)
# We can check whether we have equal predictions, it's all TRUE!
all.equal(model$train_means, CascadeForest_pred(model,
agaricus_data_train,
folds = folds))
all.equal(model$valid_means, CascadeForest_pred(model,
agaricus_data_test))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.