trainBrt: Calibrate a boosted regression tree (generalized boosting...
In adamlilith/enmSdm: Tools for Modeling Niches and Distributions of Species

trainBrt

R Documentation

Calibrate a boosted regression tree (generalized boosting machine) model

Description

This function is a wrapper for gbm.step. It returns the model with best combination of learning rate, tree depth, and bag fraction based on cross-validated deviance. It can also return a table with deviance of different combinations of tuning parameters that were tested, and all of the models tested. See Elith, J., J.R. Leathwick, and T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802-813.

Usage

trainBrt(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "bernoulli",
  learningRate = c(1e-04, 0.001, 0.01),
  treeComplexity = c(5, 3, 1),
  bagFraction = 0.6,
  minTrees = 1000,
  maxTrees = 8000,
  tries = 5,
  tryBy = c("learningRate", "treeComplexity", "maxTrees", "stepSize"),
  w = TRUE,
  anyway = FALSE,
  out = "model",
  cores = 1,
  verbose = FALSE,
  ...
)

Arguments

`data`	data frame with first column being response
`resp`	Character or integer. Name or column index of response variable. Default is to use the first column in `data`.
`preds`	Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in `data`.
`family`	Character. Name of error family. See `gbm.step`.
`learningRate`	Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1).
`treeComplexity`	Positive integer. Tree complexity: depth of branches in a single tree (1 to 16).
`bagFraction`	Numeric in the range [0, 1]. Bag fraction: proportion of data used for training in cross-validation (Elith et al. 2008 recommend 0.5 to 0.7).
`minTrees`	Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000.
`maxTrees`	Positive integer. Maximum number of trees in model set (same as parameter `max.trees` in `gbm.step`).
`tries`	Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until `tries` is reached.
`tryBy`	Character list. A list that contains one or more of `'learningRate'`, `'treeComplexity'`, `numTrees`, and/or `'stepSize'`. If a given combination of `learningRate`, `treeComplexity`, `numTrees`, `stepSize`, and `bagFraction` do not allow model convergence then then the function tries again but with alterations to any of the arguments named in `tryBy`: * `learningRate`: Decrease the learning rate by a factor of 10. * `treeComplexity`: Randomly increase/decrease tree complexity by 1 (minimum of 1). * `maxTrees`: Increase number of trees by 20 * `stepSize`: Increase step size (argument `n.trees` in `gbm.step()`) by 50 If `tryBy` is NULL then the function attempts to train the model with the same parameters up to `tries` times.
`w`	Either logical in which case `TRUE` (default) causes the total weight of presences to equal the total weight of absences (if `family='binomial'`) or a numeric list of weights, one per row in `data` or the name of the column in `data` that contains site weights. If `FALSE`, then each datum gets a weight of 1.
`anyway`	Logical. If `FALSE` (default), it is possible for no models to be returned if none converge and/or none had a number of trees is >= `minTrees`). If `TRUE` then all models are returned but with a warning.
`out`	Character. Indicates type of value returned. If `model` (default) then returns an object of class `gbm`. If `models` then all models that were trained are returned in a list in the order they appear in the tuning table (this may take a lot of memory!). If `tuning` then just return a data frame with tuning parameters and deviance of each model sorted by deviance. If both then return a 2-item list with the best model and the tuning table.
`cores`	Integer >= 1. Number of cores to use when calculating multiple models. Default is 1.
`verbose`	Logical. If `TRUE` display progress.
`...`	Arguments to pass to `gbm.step`.

Value

If out = 'model' this function returns an object of class gbm. If out = 'tuning' this function returns a data frame with tuning parameters and cross-validation deviance for each model tried. If out = c('model', 'tuning' then it returns a list object with the gbm object and the data frame. Note that if a model does not converge or does not meet sufficiency criteria (i.e., the number of optimal trees is < minTrees, then the model is not returned (a NULL value is returned for 'model' and models are simply missing from the tuning and models output.

Examples

## Not run: 
### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

# settings... defaults probably better, but these are faster
lr <- c(0.001, 0.1)
tc <- c(1, 3)
maxTrees <- 2000
set.seed(123)
model <- trainBrt(
	data = env,
	resp = 'presBg',
	preds = preds,
	learningRate = lr,
	treeComplexity = tc,
	maxTrees = maxTrees,
	verbose = TRUE
)

plot(model)

# prediction raster
nTrees <- model$gbm.call$n.trees
map <- predict(clim, model, type='response', n.trees=nTrees)
plot(map)
points(occs[ , c('longitude', 'latitude')])


## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.