trainBrt: Calibrate a boosted regression tree (generalized boosting...

View source: R/trainBrt.r

trainBrtR Documentation

Calibrate a boosted regression tree (generalized boosting machine) model

Description

This function is a wrapper for gbm.step. It returns the model with best combination of learning rate, tree depth, and bag fraction based on cross-validated deviance. It can also return a table with deviance of different combinations of tuning parameters that were tested, and all of the models tested. See Elith, J., J.R. Leathwick, and T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802-813.

Usage

trainBrt(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "bernoulli",
  learningRate = c(1e-04, 0.001, 0.01),
  treeComplexity = c(5, 3, 1),
  bagFraction = 0.6,
  minTrees = 1000,
  maxTrees = 8000,
  tries = 5,
  tryBy = c("learningRate", "treeComplexity", "maxTrees", "stepSize"),
  w = TRUE,
  anyway = FALSE,
  out = "model",
  cores = 1,
  verbose = FALSE,
  ...
)

Arguments

data

data frame with first column being response

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

family

Character. Name of error family. See gbm.step.

learningRate

Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1).

treeComplexity

Positive integer. Tree complexity: depth of branches in a single tree (1 to 16).

bagFraction

Numeric in the range [0, 1]. Bag fraction: proportion of data used for training in cross-validation (Elith et al. 2008 recommend 0.5 to 0.7).

minTrees

Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000.

maxTrees

Positive integer. Maximum number of trees in model set (same as parameter max.trees in gbm.step).

tries

Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until tries is reached.

tryBy

Character list. A list that contains one or more of 'learningRate', 'treeComplexity', numTrees, and/or 'stepSize'. If a given combination of learningRate, treeComplexity, numTrees, stepSize, and bagFraction do not allow model convergence then then the function tries again but with alterations to any of the arguments named in tryBy: * learningRate: Decrease the learning rate by a factor of 10. * treeComplexity: Randomly increase/decrease tree complexity by 1 (minimum of 1). * maxTrees: Increase number of trees by 20 * stepSize: Increase step size (argument n.trees in gbm.step()) by 50 If tryBy is NULL then the function attempts to train the model with the same parameters up to tries times.

w

Either logical in which case TRUE (default) causes the total weight of presences to equal the total weight of absences (if family='binomial') or a numeric list of weights, one per row in data or the name of the column in data that contains site weights. If FALSE, then each datum gets a weight of 1.

anyway

Logical. If FALSE (default), it is possible for no models to be returned if none converge and/or none had a number of trees is >= minTrees). If TRUE then all models are returned but with a warning.

out

Character. Indicates type of value returned. If model (default) then returns an object of class gbm. If models then all models that were trained are returned in a list in the order they appear in the tuning table (this may take a lot of memory!). If tuning then just return a data frame with tuning parameters and deviance of each model sorted by deviance. If both then return a 2-item list with the best model and the tuning table.

cores

Integer >= 1. Number of cores to use when calculating multiple models. Default is 1.

verbose

Logical. If TRUE display progress.

...

Arguments to pass to gbm.step.

Value

If out = 'model' this function returns an object of class gbm. If out = 'tuning' this function returns a data frame with tuning parameters and cross-validation deviance for each model tried. If out = c('model', 'tuning' then it returns a list object with the gbm object and the data frame. Note that if a model does not converge or does not meet sufficiency criteria (i.e., the number of optimal trees is < minTrees, then the model is not returned (a NULL value is returned for 'model' and models are simply missing from the tuning and models output.

See Also

gbm.step

Examples

## Not run: 
### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

# settings... defaults probably better, but these are faster
lr <- c(0.001, 0.1)
tc <- c(1, 3)
maxTrees <- 2000
set.seed(123)
model <- trainBrt(
	data = env,
	resp = 'presBg',
	preds = preds,
	learningRate = lr,
	treeComplexity = tc,
	maxTrees = maxTrees,
	verbose = TRUE
)

plot(model)

# prediction raster
nTrees <- model$gbm.call$n.trees
map <- predict(clim, model, type='response', n.trees=nTrees)
plot(map)
points(occs[ , c('longitude', 'latitude')])


## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.