runBRT: Run a boosted regression tree model using Sam's default...

Description Usage Arguments Value See Also Examples

Description

A wrapper to run a BRT model using gbm.step or gbm with or without selecting the op[timal number of trees using gbm.perf with parameter settings used in Bhatt et al. (2013). Covariate effect curves, relative influences and a prediction map on the probability scale are returned. A function to define regression weights can be specified through wt.fun.

BRT models sometimes fail to converge and the gbm.step implementation fails silently, returning NULL. If method = 'step', runBRT instead attempts to run the procedure max_tries times and fails with an error if it still hasn't converged.

To run a BRT model without optimising the number of trees you can set method = 'gbm' witha reasonable number of trees in n.trees, which should be much faster.

At present, only method = 'step' returns a model from which full validation statistics can be extracted.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
runBRT(data,
       gbm.x,
       gbm.y,
       pred.raster = NULL,
       gbm.coords = NULL,
       wt = NULL,
       max_tries = 5,
       verbose = FALSE,
       tree.complexity = 4,
       learning.rate = 0.005,
       bag.fraction = 0.75,
       n.trees = 10,
       n.folds = 10,
       max.trees = 10000,
       step.size = 10,
       method = c('step', 'perf', 'gbm'),
       family = 'bernoulli',
       gbm.offset = NULL,
       ...)

Arguments

data

Input dataframe.

gbm.x

Index for columns containing covariate values.

gbm.y

Index for column containing presence/absence code (1s or 0s).

pred.raster

An optional RasterBrick or RasterStack object to predict the model to.

gbm.coords

Optional index for two columns (longitude then latitude) containing coordinates of records. This is required if you later want to calculate validation statistics using pair-wise distance sampling (setting pwd = TRUE in getStats). Set to NULL (the default) if not required.

wt

An optional vector of regression weights, an index for a column giving regression weights or a function to create the weights from the presence/absence column. The default (wt = NULL) applies full weight to each record. If a function is specified, it must take a vector of 1s and 0s as input and return a vector of the same length giving regression weights. To apply a 50:50 weighting of presence and absence records (mimicking a prevalence of 0.5) use: wt = function(PA) ifelse(PA == 1, 1, sum(PA) / sum(1 - PA)).

max_tries

How many time to try and get gbm.step to converge before throwing an error.

verbose

Passed to gbm.step, whether to report on progress.

tree.complexity

Passed to gbm.step, number of bifurcations in each individual tree.

learning.rate

Passed to gbm.step, how small to shrink the contribution of each tree in the final model

bag.fraction

Passed to gbm.step, proportion of datapoints used in selecting variables

n.trees

Passed to gbm.step, initial number of trees to fit. gbm.step optimises this parameter.

n.folds

Passed to gbm.step, number of folds in each round of cross validation.

max.trees

Passed to gbm.step, maximum number of trees to fit before stopping the stepping algorithm.

step.size

Passed to gbm.step, number of trees to add at each iteration.

method

Whether to run the model using the gbm.step procedure (method = 'step') to automatically detect the number of trees (the default), the gbm.perf procedure using cross-validation post-hoc method = 'perf' (much faster) or a simple gbm model with a the number of trees fixed at coden.trees method = 'gbm' (even faster, but potentially less accurate). Both 'step' amd 'perf' will fit up to a maximum ofmax.trees trees.

family

The probability distribution for the likelihood, passed to either the family argument of gbm.step (if method = 'step') or the distribution argument of gbm (if method = 'perf' or method = 'gbm').

gbm.offset

If family = 'poisson', gbm.offset can be used to specify a column of data giving an offset, passed as the offset argument to either gbm or gbm.step, (depending on method).

...

Additional functions to pass to gbm.step.

Value

A list containing four elements

model

the fitted gbm model

effects

a list of effect curves with one element ofr each covariate

relinf

a vector of relative influence estimates for each covariate

pred

a RasterLayer giving predictions on the probability scale (or NULL if pred.raster = NULL)

coords

a dataframe giving the coordinates of the training points (or NULL if gbm.coords = NULL)

See Also

gbm.step, getRelInf, getEffectPlots, combinePreds

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# load the data
data(occurrence)

# load the covariate rasters
data(covariates)

# load evidence consensus layer
data(consensus)

background <- bgSample(consensus,
                       n= 100,
                       replace=FALSE,
                       spatial=FALSE)

colnames(background) <- c('Longitude', 'Latitude')
background <- data.frame(background)

# combine the occurrence and background records
dat <- rbind(cbind(PA = rep(1, nrow(occurrence)),
                   occurrence[, c('Longitude', 'Latitude')]),
             cbind(PA = rep(0, nrow(background)),
                   background[ ,c('Longitude', 'Latitude')]))

# extract covariate values for each data point
dat_covs <- extract(covariates, dat[, c('Longitude', 'Latitude')])

# combine covariates with the other info
dat_all <- cbind(dat, dat_covs)

model <- runBRT(dat_all,
                gbm.x = 4:6,
                gbm.y = 1,
                n.folds = 5)

SEEG-Oxford/seegSDM documentation built on May 9, 2019, 11:08 a.m.