View source: R/stackedensemble.R
h2o.stackedEnsemble  R Documentation 
Build a stacked ensemble (aka. Super Learner) using the H2O base learning algorithms specified by the user.
h2o.stackedEnsemble(
x,
y,
training_frame,
model_id = NULL,
validation_frame = NULL,
blending_frame = NULL,
base_models = list(),
metalearner_algorithm = c("AUTO", "deeplearning", "drf", "gbm", "glm", "naivebayes",
"xgboost"),
metalearner_nfolds = 0,
metalearner_fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"),
metalearner_fold_column = NULL,
metalearner_params = NULL,
metalearner_transform = c("NONE", "Logit"),
max_runtime_secs = 0,
weights_column = NULL,
offset_column = NULL,
seed = 1,
score_training_samples = 10000,
keep_levelone_frame = FALSE,
export_checkpoints_dir = NULL,
auc_type = c("AUTO", "NONE", "MACRO_OVR", "WEIGHTED_OVR", "MACRO_OVO",
"WEIGHTED_OVO")
)
x 
(Optional). A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. Training frame is used only to compute ensemble training metrics. 
y 
The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. 
training_frame 
Id of the training data frame. 
model_id 
Destination id for this model; autogenerated if not specified. 
validation_frame 
Id of the validation data frame. 
blending_frame 
Frame used to compute the predictions that serve as the training frame for the metalearner (triggers blending mode if provided) 
base_models 
List of models or grids (or their ids) to ensemble/stack together. Grids are expanded to individual models. If not using blending frame, then models must have been crossvalidated using nfolds > 1, and folds must be identical across models. 
metalearner_algorithm 
Type of algorithm to use as the metalearner. Options include 'AUTO' (GLM with non negative weights; if validation_frame is present, a lambda search is performed), 'deeplearning' (Deep Learning with default parameters), 'drf' (Random Forest with default parameters), 'gbm' (GBM with default parameters), 'glm' (GLM with default parameters), 'naivebayes' (NaiveBayes with default parameters), or 'xgboost' (if available, XGBoost with default parameters). Must be one of: "AUTO", "deeplearning", "drf", "gbm", "glm", "naivebayes", "xgboost". Defaults to AUTO. 
metalearner_nfolds 
Number of folds for Kfold crossvalidation of the metalearner algorithm (0 to disable or >= 2). Defaults to 0. 
metalearner_fold_assignment 
Crossvalidation fold assignment scheme for metalearner crossvalidation. Defaults to AUTO (which is currently set to Random). The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Must be one of: "AUTO", "Random", "Modulo", "Stratified". 
metalearner_fold_column 
Column with crossvalidation fold index assignment per observation for crossvalidation of the metalearner. 
metalearner_params 
Parameters for metalearner algorithm 
metalearner_transform 
Transformation used for the level one frame. Must be one of: "NONE", "Logit". Defaults to NONE. 
max_runtime_secs 
Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. 
weights_column 
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are perrow observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but noninteger values are supported as well. During training, rows with higher weights matter more, due to the larger loss function prefactor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. 
offset_column 
Offset column. This will be added to the combination of columns before applying the link function. 
seed 
Seed for random numbers; passed through to the metalearner algorithm. Defaults to 1 (timebased random number). 
score_training_samples 
Specify the number of training set samples for scoring. The value must be >= 0. To use all training samples, enter 0. Defaults to 10000. 
keep_levelone_frame 

export_checkpoints_dir 
Automatically export generated models to this directory. 
auc_type 
Set default multinomial AUC type. Must be one of: "AUTO", "NONE", "MACRO_OVR", "WEIGHTED_OVR", "MACRO_OVO", "WEIGHTED_OVO". Defaults to AUTO. 
## Not run:
library(h2o)
h2o.init()
# Import a sample binary outcome train/test set
train < h2o.importFile("https://s3.amazonaws.com/erindata/higgs/higgs_train_10k.csv")
test < h2o.importFile("https://s3.amazonaws.com/erindata/higgs/higgs_test_5k.csv")
# Identify predictors and response
y < "response"
x < setdiff(names(train), y)
# For binary classification, response should be a factor
train[, y] < as.factor(train[, y])
test[, y] < as.factor(test[, y])
# Number of CV folds
nfolds < 5
# Train & Crossvalidate a GBM
my_gbm < h2o.gbm(x = x,
y = y,
training_frame = train,
distribution = "bernoulli",
ntrees = 10,
max_depth = 3,
min_rows = 2,
learn_rate = 0.2,
nfolds = nfolds,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train & Crossvalidate a RF
my_rf < h2o.randomForest(x = x,
y = y,
training_frame = train,
ntrees = 50,
nfolds = nfolds,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train a stacked ensemble using the GBM and RF above
ensemble < h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
model_id = "my_ensemble_binomial",
base_models = list(my_gbm, my_rf))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.