hmda.grid | R Documentation |
Generates a hyperparameter grid for a single tree-based algorithm (either "drf" or "gbm") by running a grid search. The function validates inputs, generates an automatic grid ID for the grid (if not provided), and optionally saves the grid to a recovery directory. The resulting grid object contains all trained models and can be used for further analysis. For scientific computing, saving the grid is highly recommended to avoid future re-running the training!
hmda.grid(
algorithm = c("drf", "gbm"),
grid_id = NULL,
x,
y,
training_frame = h2o.getFrame("hmda.train.hex"),
validation_frame = NULL,
hyper_params = list(),
nfolds = 10,
seed = NULL,
keep_cross_validation_predictions = TRUE,
recovery_dir = NULL,
sort_by = "logloss",
...
)
algorithm |
Character. The algorithm to tune. Supported values are "drf" (Distributed Random Forest) and "gbm" (Gradient Boosting Machine). Only one algorithm can be specified. (Case-insensitive) |
grid_id |
Character. Optional identifier for the grid search.
If |
x |
Vector. Predictor column names or indices. |
y |
Character. The response column name or index. |
training_frame |
An H2OFrame containing the training data.
Default is |
validation_frame |
An H2OFrame for early stopping. Default is |
hyper_params |
List. A list of hyperparameter vectors for tuning.
If you do not have a clue about how to specify the
hyperparameters, consider consulting |
nfolds |
Integer. Number of folds for cross-validation. Default is 10. |
seed |
Integer. A seed for reproducibility.
Default is |
keep_cross_validation_predictions |
Logical. Whether to keep
cross-validation predictions. Default is |
recovery_dir |
Character. Directory path to save the grid search
output. If provided, the grid is saved using
|
sort_by |
Character. Metric used to sort the grid. Default is "logloss". |
... |
Additional arguments passed to |
The function executes the following steps:
Input Validation: Ensures only one algorithm is specified and verifies that the training frame is an H2OFrame.
Grid ID Generation: If no grid_id
is provided, it
creates one using the algorithm name and the current time.
Grid Search Execution: Calls h2o.grid()
with the
provided hyperparameters and cross-validation settings.
Grid Saving: If a recovery directory is specified, the grid
is saved to disk using h2o.saveGrid()
.
The output is an H2O grid object that contains all the trained models.
An object of class H2OGrid
containing the grid search
results.
E. F. Haghish
## Not run:
library(HMDA)
library(h2o)
hmda.init()
# Import a sample binary outcome dataset into H2O
train <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test <- h2o.importFile(
"https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# For binary classification, response should be a factor
train[, y] <- as.factor(train[, y])
test[, y] <- as.factor(test[, y])
params <- list(learn_rate = c(0.01, 0.1),
max_depth = c(3, 5, 9),
sample_rate = c(0.8, 1.0)
)
# Train and validate a cartesian grid of GBMs
hmda_grid1 <- hmda.grid(algorithm = "gbm", x = x, y = y,
grid_id = "hmda_grid1",
training_frame = train,
nfolds = 10,
ntrees = 100,
seed = 1,
hyper_params = gbm_params1)
# Assess the performances of the models
grid_performance <- hmda.grid.analysis(hmda_grid1)
# Return the best 2 models according to each metric
hmda.best.models(grid_performance, n_models = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.