gbm.step.sd: Function to assess optimal no of boosting trees using k-fold...
In SimonDedman/gbm.auto: Automated Boosted Regression Tree Modelling and Mapping Suite

gbm.step.sd

R Documentation

Function to assess optimal no of boosting trees using k-fold cross validation

Description

SD fork of dismo's gbm.step to add evaluation metrics like d.squared and rmse. J. Leathwick and J. Elith - 19th September 2005, version 2.9. Function to assess optimal no of boosting trees using k-fold cross validation. Implements the cross-validation procedure described on page 215 of Hastie T, Tibshirani R, Friedman JH (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer-Verlag, New York.

Usage

gbm.step.sd(
  data,
  gbm.x,
  gbm.y,
  offset = NULL,
  fold.vector = NULL,
  tree.complexity = 1,
  learning.rate = 0.01,
  bag.fraction = 0.75,
  site.weights = rep(1, nrow(data)),
  var.monotone = rep(0, length(gbm.x)),
  n.folds = 10,
  prev.stratify = TRUE,
  family = "bernoulli",
  n.trees = 50,
  step.size = n.trees,
  max.trees = 10000,
  tolerance.method = "auto",
  tolerance = 0.001,
  plot.main = TRUE,
  plot.folds = FALSE,
  verbose = TRUE,
  silent = FALSE,
  keep.fold.models = FALSE,
  keep.fold.vector = FALSE,
  keep.fold.fit = FALSE,
  ...
)

Arguments

`data`	The input dataframe.
`gbm.x`	The predictors.
`gbm.y`	The response.
`offset`	Allows an offset to be specified.
`fold.vector`	Allows a fold vector to be read in for CV with offsets,.
`tree.complexity`	Sets the complexity of individual trees.
`learning.rate`	Sets the weight applied to inidivudal trees.
`bag.fraction`	Sets the proportion of observations used in selecting variables.
`site.weights`	Allows varying weighting for sites.
`var.monotone`	Restricts responses to individual predictors to monotone.
`n.folds`	Number of folds.
`prev.stratify`	Prevalence stratify the folds - only for p/a data.
`family`	Family - bernoulli (=binomial), poisson, laplace or gaussian.
`n.trees`	Number of initial trees to fit.
`step.size`	Numbers of trees to add at each cycle.
`max.trees`	Max number of trees to fit before stopping.
`tolerance.method`	Method to use in deciding to stop - "fixed" or "auto".
`tolerance`	Tolerance value to use - if method == fixed is absolute, if auto is multiplier * total mean deviance.
`plot.main`	Plot hold-out deviance curve.
`plot.folds`	Plot the individual folds as well.
`verbose`	Control amount of screen reporting.
`silent`	To allow running with no output for simplifying model).
`keep.fold.models`	Keep the fold models from cross valiation.
`keep.fold.vector`	Allows the vector defining fold membership to be kept.
`keep.fold.fit`	Allows the predicted values for observations from CV to be kept.
`...`	Allows for any additional plotting parameters.

Details

Divides the data into 10 subsets, with stratification by prevalence if required for pa data then fits a gbm model of increasing complexity along the sequence from n.trees to n.trees + (n.steps * step.size) calculating the residual deviance at each step along the way after each fold processed, calculates the average holdout residual deviance and its standard error then identifies the optimal number of trees as that at which the holdout deviance is minimised and fits a model with this number of trees, returning it as a gbm model along with additional information from the cv selection process.

D squared is 1 - (cv.dev / total.deviance). Abeare thesis: For each of the fitted models, the pseudo-R2, or D2, or Explained Deviance, was calculated for comparison, where: D2 = 1 – (residual deviance/total deviance).

requires gbm library from Cran requires roc and calibration scripts of J Elith requires calc.deviance script of J Elith/J Leathwick

Value

GBM models using gbm as the engine.

SimonDedman/gbm.auto documentation built on June 10, 2025, 7:07 a.m.

SimonDedman/gbm.auto index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SimonDedman/gbm.auto
Automated Boosted Regression Tree Modelling and Mapping Suite

gbm.step.sd: Function to assess optimal no of boosting trees using k-fold...
In SimonDedman/gbm.auto: Automated Boosted Regression Tree Modelling and Mapping Suite

Function to assess optimal no of boosting trees using k-fold cross validation

Description

Usage

Arguments

Details

Value

Related to gbm.step.sd in SimonDedman/gbm.auto...

R Package Documentation

Browse R Packages

We want your feedback!

SimonDedman/gbm.auto Automated Boosted Regression Tree Modelling and Mapping Suite

gbm.step.sd: Function to assess optimal no of boosting trees using k-fold... In SimonDedman/gbm.auto: Automated Boosted Regression Tree Modelling and Mapping Suite

Function to assess optimal no of boosting trees using k-fold cross validation

Description

Usage

Arguments

Details

Value

Related to gbm.step.sd in SimonDedman/gbm.auto...

R Package Documentation

Browse R Packages

We want your feedback!

SimonDedman/gbm.auto
Automated Boosted Regression Tree Modelling and Mapping Suite

gbm.step.sd: Function to assess optimal no of boosting trees using k-fold...
In SimonDedman/gbm.auto: Automated Boosted Regression Tree Modelling and Mapping Suite