gbmt_fit: GBMT fit
In gbm-developers/gbm3: Generalized Boosted Regression Models

gbmt_fit

R Documentation

GBMT fit

Description

Fits a generalized boosting model. This is for "power" users who have a large number of variables who wish to avoid calling model.frame which can be slow in this instance.

Usage

gbmt_fit(
  x,
  y,
  distribution = gbm_dist("Gaussian"),
  weights = rep(1, nrow(x)),
  offset = rep(0, nrow(x)),
  train_params = training_params(num_trees = 100, interaction_depth = 3,
    min_num_obs_in_node = 10, shrinkage = 0.001, bag_fraction = 0.5, id =
    seq_len(nrow(x)), num_train = round(0.5 * nrow(x)), num_features = ncol(x)),
  response_name = "y",
  var_monotone = NULL,
  var_names = NULL,
  keep_gbm_data = FALSE,
  cv_folds = 1,
  cv_class_stratify = FALSE,
  fold_id = NULL,
  par_details = getOption("gbm.parallel"),
  is_verbose = FALSE
)

Arguments

`x`	a data frame or data matrix containing the predictor variables.
`y`	is a matrix of outcomes. Excluding CoxPH this matrix of outcomes collapses to a vector; in the case of CoxPH it is a survival object where the event times fill the first one (or two columns) and the status fills the final column. The length of the 1st dimension of y must match the number of rows in x.
`distribution`	a `GBMDist` object specifying the distribution and any additional parameters needed.
`weights`	optional vector of weights used in the fitting process. These weights must be positive but need not be normalized. By default they are set to 1 for each data row.
`offset`	optional vector specifying the model offset; must be positive. This defaults to a vector of 0's, the length of which is equal to the rows of x.
`train_params`	a GBMTrainParams object which specifies the parameters used in growing decision trees.
`response_name`	a string specifying the name of the response - defaults to "y".
`var_monotone`	optional vector, the same length as the number of predictors, indicating the relationship each variable has with the outcome. It have a monotone increasing (+1) or decreasing (-1) or an arbitrary relationship.
`var_names`	a vector of strings of containing the names of the predictor variables.
`keep_gbm_data`	a bool specifying whether or not the gbm_data object created in this method should be stored in the results.
`cv_folds`	a positive integer specifying the number of folds to be used in cross-validation of the gbm fit. If cv_folds > 1 then cross-validation is performed; the default of cv_folds is 1.
`cv_class_stratify`	a bool specifying whether or not to stratify via response outcome. Currently only applies to "Bernoulli" distribution and defaults to false.
`fold_id`	An optional vector of values identifying what fold each observation is in. If supplied, cv_folds can be missing. Note: Multiple rows of the same observation must have the same fold_id.
`par_details`	Details of the parallelization to use in the core algorithm.
`is_verbose`	if TRUE, gbmt will print out progress and performance of the fit.