startml: startml
In andrewsommerlot/startml: Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

Description Usage Arguments Value

startml is designed to run automatic hyperparameter searches for deep leaning gradient boosted machine, and random forest models. It selects best models, and combines or ensembles them in hopes making good predictions from an ensemble or highly skilled single model using just one function call. Machine learning algorithms are provided by h2o and run on the h2o JVM platform outside of the R workspace. Thus, much of the functionalies in startml are scalable. Currently, startml only supports regression and binary classification.

startml(labeled_data, newdata, y, x = NULL, label_id = NULL, y_type,
  algorithms = c("deeplearning", "randomForest", "gbm"),
  eval_metric = "AUTO", validation_type = "shared_holdout",
  percent_valid_holdout = 10, percent_test_holdout = 10,
  runtime_secs = 10, split_seed = NULL, trim = FALSE,
  number_top_models = NULL, eval_threshold = NULL,
  correlation_threshold = 0, return_dataframe = FALSE, wd = getwd())

`labeled_data`	H2O frame object containing labeled data for model training. No Default.
`newdata`	H2O frame object containing unlabeled data for model predictions. No Default.
`y`	Character object of length 1 identifying the column name of the target variable. No Default.
`x`	Character object of length 1 or more identifying the column name(s) of the input variables. Default NULL, uses all remaining variables in labeled_data as inputs. Newdata must contian all of these input column names.
`label_id`	Character object of length 1 identifying the name of the column of observation IDs in labeled_data. If used, must match column of same name in newdata. startml will ignore this column as an input, but include it as an ID column in prediction outputs.
`y_type`	Character object of length 1 identifying the type of data the target variable is. Can be "continuous" or "discrete." Coninuous automatically creates regression models, and discrete automatically creates binomial models. Currently, startml only supports regression and binary classification.
`algorithms`	Character object of length 3, 2, or 1, specifying which alrogrithms to automatically train. The autotrain function will run a separate grid search for each algorimth type. Choices are: "deeplearning", "randomForest", and "gbm" following the naming convention in H2O version 3. Defaults to c("deeplearning", "randomForest", "gbm").
`eval_metric`	Character object defining evaluation metric for training. Defualt is "AUTO" and uses built-in H2O automatic choice for target data type.
`validation_type`	Defines validation type for training models. Defaults to "shared_holdout" indicating all model built with all algorithms share the same validation set. Currently, this is the only option in autotrain. Planned types include "random_holdout" where each model will get a unique randomized sample of labeled data for validation, and "xval" in which the cross validation functionality in H2O will be implemented in every model.
`percent_valid_holdout`	Numeric object of value 0 to 100. Sets the percent of the labeled data that will be used for holdout validation. Default is 10. Is ignored if validation_type = "xval." Currently startml only supports "shared_holdout" validation.
`percent_test_holdout`	Numeric object of value 0 to 100. Sets the percent of the labeled data that will be used for test holdout for model selection. Default is 10.
`runtime_secs`	Character Object which sets the length of time each grid search will run. Defaults to 20, thus the default runtime is 20 sec * (length of algorimths) = 1 minute.
`split_seed`	Random seed for splitting labeled data into train, validation, and test components. Currently, startml only supports random sampling splits, this argument sets the random seed for these splits, making the data set separation process reproducible. Since this is a "naive" random split, labeled data should be shuffled before hand.
`trim`	Boolean. When TRUE, output is trimmed with eval_threshold, correlation_threshold, or number_top_models. When FALSE, all models are returned. Default FALSE.
`number_top_models`	Numeric object indicating number of top models to return. Defualt is 10. If number entered is greater than number of model, whole model list is returned.
`eval_threshold`	Numeric objsect defining the performance threshold models must meet to be used in prediction. Is minimum for maximization loss function (i.e., AUC) and maximum for minimization loss functions (logloss, MSE, etc). Default is NULL, returns models without performance consideration.
`correlation_threshold`	Numeric object defining the maximum person correlation allowed in the group of resulting models. If two models show high correlation, the one with surperior performance will be kept and the other dropped. Value ranges from -1 to 1, default is NULL, returning models without correlation considered.
`return_dataframe`	Depricated. Always keep equal to FALSE
`wd`	Character object defining file path where resulting modeling will be saved. Defualts to current working directory.

Object of class mlblob using S4 type. mlblob objects contain all selected models, their predictions on train, validation, test, and new data, and can be plotted using plot() showing a summary of the model group. Slots are: models, a list of h2o model objects labeled_data an h2o frame object equivalent to the input label_data input object.

andrewsommerlot/startml documentation built on May 5, 2019, 6:58 p.m.

andrewsommerlot/startml index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

andrewsommerlot/startml
Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

startml: startml
In andrewsommerlot/startml: Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

Description

Usage

Arguments

Value

Related to startml in andrewsommerlot/startml...

R Package Documentation

Browse R Packages

We want your feedback!

andrewsommerlot/startml Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

startml: startml In andrewsommerlot/startml: Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

Description

Usage

Arguments

Value

Related to startml in andrewsommerlot/startml...

R Package Documentation

Browse R Packages

We want your feedback!

andrewsommerlot/startml
Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o

startml: startml
In andrewsommerlot/startml: Start doing machine learning with training optimization and ensembles of popular machine learning algorithms powered by scalable implementations in h2o