gbmt_fit | R Documentation |
Fits a generalized boosting model. This is for "power" users who
have a large number of variables who wish to avoid calling
model.frame
which can be slow in this instance.
gbmt_fit(
x,
y,
distribution = gbm_dist("Gaussian"),
weights = rep(1, nrow(x)),
offset = rep(0, nrow(x)),
train_params = training_params(num_trees = 100, interaction_depth = 3,
min_num_obs_in_node = 10, shrinkage = 0.001, bag_fraction = 0.5, id =
seq_len(nrow(x)), num_train = round(0.5 * nrow(x)), num_features = ncol(x)),
response_name = "y",
var_monotone = NULL,
var_names = NULL,
keep_gbm_data = FALSE,
cv_folds = 1,
cv_class_stratify = FALSE,
fold_id = NULL,
par_details = getOption("gbm.parallel"),
is_verbose = FALSE
)
x |
a data frame or data matrix containing the predictor variables. |
y |
is a matrix of outcomes. Excluding CoxPH this matrix of outcomes collapses to a vector; in the case of CoxPH it is a survival object where the event times fill the first one (or two columns) and the status fills the final column. The length of the 1st dimension of y must match the number of rows in x. |
distribution |
a |
weights |
optional vector of weights used in the fitting process. These weights must be positive but need not be normalized. By default they are set to 1 for each data row. |
offset |
optional vector specifying the model offset; must be positive. This defaults to a vector of 0's, the length of which is equal to the rows of x. |
train_params |
a GBMTrainParams object which specifies the parameters used in growing decision trees. |
response_name |
a string specifying the name of the response - defaults to "y". |
var_monotone |
optional vector, the same length as the number of predictors, indicating the relationship each variable has with the outcome. It have a monotone increasing (+1) or decreasing (-1) or an arbitrary relationship. |
var_names |
a vector of strings of containing the names of the predictor variables. |
keep_gbm_data |
a bool specifying whether or not the gbm_data object created in this method should be stored in the results. |
cv_folds |
a positive integer specifying the number of folds to be used in cross-validation of the gbm fit. If cv_folds > 1 then cross-validation is performed; the default of cv_folds is 1. |
cv_class_stratify |
a bool specifying whether or not to stratify via response outcome. Currently only applies to "Bernoulli" distribution and defaults to false. |
fold_id |
An optional vector of values identifying what fold each observation is in. If supplied, cv_folds can be missing. Note: Multiple rows of the same observation must have the same fold_id. |
par_details |
Details of the parallelization to use in the core algorithm. |
is_verbose |
if TRUE, gbmt will print out progress and performance of the fit. |
a GBMFit
object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.