trainGam: Calibrate a generalized additive model (GAM)

View source: R/trainGam.r

trainGamR Documentation

Calibrate a generalized additive model (GAM)

Description

This function constructs a GAM piece-by-piece by first calculating AICc for all models with univariate and bivariate (interaction) terms. It then creates a "full" model with the highest-ranked uni/bivariate terms then implements an all-subsets model selection routine.

Usage

trainGam(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "binomial",
  gamma = 1,
  construct = TRUE,
  select = TRUE,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  initialTerms = 8,
  interaction = "te",
  w = TRUE,
  out = "model",
  verbose = FALSE,
  ...
)

Arguments

data

Data frame. Must contain fields with same names as in preds object.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

family

Name of family for data error structure (see ?family).

gamma

Initial penalty to degrees of freedom to use (larger ==> smoother fits).

construct

Logical. If TRUE then construct model by computing AICc for all univariate and bivariate models. Then add terms up to maximum set by presPerTermInitial and initialTerms.

select

Logical. If TRUE then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if any).

presPerTermInitial

Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only if construct is TRUE.

presPerTermFinal

Positive integer. Minimum number of presence sites per term in initial starting model; used only if select is TRUE.

initialTerms

Positive integer. Maximum number of terms to be used in an initial model. Used only if construct is TRUE. The maximum that can be handled by dredge() is 31, so if this number is >31 and select is TRUE then it is forced to 31 with a warning. Note that the number of coefficients for factors is not calculated correctly, so if the predictors contain factors then this number might have to be reduced even more.

interaction

Character or NULL. Type of interaction term to use (te, ts, s, etc.). See ?te (for example) for help on any one of these. If NULL then interactions are not used.

w

Either logical in which case TRUE causes the total weight of presences to equal the total weight of absences (if family='binomial') OR a numeric list of weights, one per row in data OR the name of the column in data that contains site weights. The default is to assign a weight of 1 to each datum.

out

Character. Indicates type of value returned. If model (default) then returns an object of class brglm or glm (depending on the value of use). If tuning then just return the AICc table for each kind of model term used in model construction. If both then return a 2-item list with the best model and the AICc table.

verbose

Logical. If TRUE then display intermediate results on the display device.

...

Extra arguments (not used).

Value

If out = 'model' this function returns an object of class gam. If out = 'tuning' this function returns a data frame with tuning parameters and AICc for each model tried. If out = c('model', 'tuning' then it returns a list object with the gam object and the data frame.

See Also

gam

Examples

## Not run: 
library(brglm2)

### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

# GLM
gl <- trainGlm(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# GAM
ga <- trainGam(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# NS
ns <- trainNs(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# prediction rasters
mapGlm <- predict(clim, gl, type='response')
mapGam <- predict(clim, ga, type='response')
mapNs <- predict(clim, ga, type='response')

par(mfrow=c(1, 3))
plot(mapGlm, main='GLM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapGam, main='GAM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapNs, main='NS')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.