gbmt_fit: GBMT fit

View source: R/gbmt-fit.r

gbmt_fitR Documentation

GBMT fit

Description

Fits a generalized boosting model. This is for "power" users who have a large number of variables who wish to avoid calling model.frame which can be slow in this instance.

Usage

gbmt_fit(
  x,
  y,
  distribution = gbm_dist("Gaussian"),
  weights = rep(1, nrow(x)),
  offset = rep(0, nrow(x)),
  train_params = training_params(num_trees = 100, interaction_depth = 3,
    min_num_obs_in_node = 10, shrinkage = 0.001, bag_fraction = 0.5, id =
    seq_len(nrow(x)), num_train = round(0.5 * nrow(x)), num_features = ncol(x)),
  response_name = "y",
  var_monotone = NULL,
  var_names = NULL,
  keep_gbm_data = FALSE,
  cv_folds = 1,
  cv_class_stratify = FALSE,
  fold_id = NULL,
  par_details = getOption("gbm.parallel"),
  is_verbose = FALSE
)

Arguments

x

a data frame or data matrix containing the predictor variables.

y

is a matrix of outcomes. Excluding CoxPH this matrix of outcomes collapses to a vector; in the case of CoxPH it is a survival object where the event times fill the first one (or two columns) and the status fills the final column. The length of the 1st dimension of y must match the number of rows in x.

distribution

a GBMDist object specifying the distribution and any additional parameters needed.

weights

optional vector of weights used in the fitting process. These weights must be positive but need not be normalized. By default they are set to 1 for each data row.

offset

optional vector specifying the model offset; must be positive. This defaults to a vector of 0's, the length of which is equal to the rows of x.

train_params

a GBMTrainParams object which specifies the parameters used in growing decision trees.

response_name

a string specifying the name of the response - defaults to "y".

var_monotone

optional vector, the same length as the number of predictors, indicating the relationship each variable has with the outcome. It have a monotone increasing (+1) or decreasing (-1) or an arbitrary relationship.

var_names

a vector of strings of containing the names of the predictor variables.

keep_gbm_data

a bool specifying whether or not the gbm_data object created in this method should be stored in the results.

cv_folds

a positive integer specifying the number of folds to be used in cross-validation of the gbm fit. If cv_folds > 1 then cross-validation is performed; the default of cv_folds is 1.

cv_class_stratify

a bool specifying whether or not to stratify via response outcome. Currently only applies to "Bernoulli" distribution and defaults to false.

fold_id

An optional vector of values identifying what fold each observation is in. If supplied, cv_folds can be missing. Note: Multiple rows of the same observation must have the same fold_id.

par_details

Details of the parallelization to use in the core algorithm.

is_verbose

if TRUE, gbmt will print out progress and performance of the fit.

Value

a GBMFit object.


gbm-developers/gbm3 documentation built on April 28, 2024, 10:04 p.m.