h2o.train_segments: H2O Segmented-Data Bulk Model Training
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.train_segments

R Documentation

H2O Segmented-Data Bulk Model Training

Description

Provides a set of functions to train a group of models on different segments (subpopulations) of the training set.

Usage

h2o.train_segments(
  algorithm,
  segment_columns,
  segment_models_id,
  parallelism = 1,
  ...
)

Arguments

`algorithm`	Name of algorithm to use in training segment models (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, psvm, xgboost, pca, svd, targetencoder, aggregator, word2vec, coxph, isolationforest, kmeans, stackedensemble, glrm, gam, anovaglm, modelselection).
`segment_columns`	A list of columns to segment-by. H2O will group the training (and validation) dataset by the segment-by columns and train a separate model for each segment (group of rows).
`segment_models_id`	Identifier for the returned collection of Segment Models. If not specified it will be automatically generated.
`parallelism`	Level of parallelism of bulk model building, it is the maximum number of models each H2O node will be building in parallel, defaults to 1.
`...`	Use to pass along training_frame parameter, x, y, and all non-default parameter values to the algorithm Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters.

Details

Start Segmented-Data bulk Model Training for a given algorithm and parameters.

Examples

## Not run: 
library(h2o)
h2o.init()
iris_hf <- as.h2o(iris)
models <- h2o.train_segments(algorithm = "gbm", 
                             segment_columns = "Species",
                             x = c(1:3), y = 4, 
                             training_frame = iris_hf,
                             ntrees = 5, 
                             max_depth = 4)
as.data.frame(models)

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.