train.gbm | R Documentation |
Provides a wrapping function for the gbm
.
train.gbm(
formula,
data,
distribution = "bernoulli",
weights,
var.monotone = NULL,
n.trees = 100,
interaction.depth = 1,
n.minobsinnode = 10,
shrinkage = 0.001,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds = 0,
keep.data = TRUE,
verbose = F,
class.stratify.cv = NULL,
n.cores = NULL
)
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. |
distribution |
Either a character string specifying the name of the distribution to use or a list with a component name specifying the distribution and any additional parameters needed. |
weights |
an optional vector of weights to be used in the fitting process. Must be positive but do not need to be normalized. |
var.monotone |
an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. |
n.trees |
Integer specifying the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. Default is 100. |
interaction.depth |
Integer specifying the maximum depth of each tree (i.e., the highest level of variable interactions allowed). A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is 1. |
n.minobsinnode |
Integer specifying the minimum number of observations in the terminal nodes of the trees. Note that this is the actual number of observations, not the total weight. |
shrinkage |
a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.1. |
bag.fraction |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. |
train.fraction |
The first train.fraction * nrows(data) observations are used to fit the gbm and the remainder are used for computing out-of-sample estimates of the loss function. |
cv.folds |
Number of cross-validation folds to perform. If cv.folds>1 then gbm, in addition to the usual fit, will perform a cross-validation, calculate an estimate of generalization error returned in cv.error. |
keep.data |
a logical variable indicating whether to keep the data and an index of the data stored with the object. Keeping the data and index makes subsequent calls to gbm.more faster at the cost of storing an extra copy of the dataset. |
verbose |
Logical indicating whether or not to print out progress and performance indicators (TRUE). If this option is left unspecified for gbm.more, then it uses verbose from object. Default is FALSE. |
class.stratify.cv |
Logical indicating whether or not the cross-validation should be stratified by class. |
n.cores |
The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores. If n.cores is not specified by the user, it is guessed using the detectCores function in the parallel package. |
A object gbm.prmdt with additional information to the model that allows to homogenize the results.
The parameter information was taken from the original function gbm
.
The internal function is from package gbm
.
# Classification
data <- iris
n <- nrow(data)
sam <- sample(1:n, n*0.75)
training <- data[sam,]
testing <- data[-sam,]
model <- train.gbm(formula = Species ~ ., data = training)
model
predict <- predict(object = model, testing)
predict
# Regression
len <- nrow(swiss)
sampl <- sample(x = 1:len,size = len*0.10,replace = FALSE)
ttesting <- swiss[sampl,]
ttraining <- swiss[-sampl,]
model.gbm <- train.gbm(Infant.Mortality~., ttraining, distribution = "gaussian")
prediction <- predict(model.gbm, ttesting)
prediction
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.