trainGlm: Calibrate a generalized linear model (GLM)

View source: R/trainGlm.r

trainGlmR Documentation

Calibrate a generalized linear model (GLM)

Description

This function constructs a GLM piece-by-piece by first calculating AICc for all models with univariate, quadratic, and 2-way-interaction terms. It then creates a "full" model with the highest-ranked uni/bivariate terms. Finally, it implements an all-subsets model selection routine using AICc. Its output is any or all of: a table with AICc for all possible models, all possible models (after model construction), and/or the model with the lowest AICc.

Usage

trainGlm(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "binomial",
  construct = TRUE,
  select = TRUE,
  anyway = FALSE,
  quadratic = TRUE,
  interaction = TRUE,
  verboten = NULL,
  verbotenCombos = NULL,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  initialTerms = 10,
  w = TRUE,
  method = "glm.fit",
  out = "model",
  tooBig = 1e+07,
  verbose = FALSE,
  ...
)

Arguments

data

Data frame. Must contain fields with same names as in preds object.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

family

Name of family for data error structure (see family). Default is to use the 'binomial' family.

construct

Logical. If TRUE (default) then construct model from individual terms entered in order from lowest to highest AICc up to limits set by presPerTermInitial or initialTerms is met. If FALSE then the "full" model consists of all terms allowed by quadratic and interaction.

select

Logical. If TRUE (default) then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if any).

anyway

Logical. If FALSE (default), then during model construction, if no univariate models have valid coefficients (< tooBog), then do not proceed and return NULL. If TRUE, then proceed with instable models (with a warning), but if teh final "best" model has unstable coefficients, then return NULL for the best model.

quadratic

Logical. Used only if construct is TRUE. If TRUE (default) then include quadratic terms in model construction stage for non-factor predictors.

interaction

Logical. Used only if construct is TRUE. If TRUE (default) then include 2-way interaction terms (including interactions between factor predictors).

verboten

Either NULL (default) in which case forms is returned without any manipulation. Alternatively, this is a character list of terms that are not allowed to appear in any model in forms. Models with these terms are removed from forms. Note that the order of variables in interaction terms does not matter (e.g., x1:x2 will cause the removal of models with this term verbatim as well as x2:x1). All possible permutations of three-way interaction terms are treated similarly.

verbotenCombos

Either NULL or a list of lists. This argument allows excluding particular combinations of variables using exact matches (i.e., a variable appears exactly as stated) or general matches (i.e., a variable appears in any term). Please see the Details section of makeFormulae for more information on how to use this argument. The default is NULL in which case any combination of variables is allowed.

presPerTermInitial

Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is construct is TRUE.

presPerTermFinal

Positive integer. Minimum number of presence sites per term in initial starting model. Used only if select is TRUE.

initialTerms

Positive integer. Maximum number of terms to be used in an initial model. Used only if construct is TRUE.

w

Either logical in which case TRUE causes the total weight of presences to equal the total weight of absences (if family='binomial') OR a numeric list of weights, one per row in data OR the name of the column in data that contains site weights. The default is to assign equal total weights to presences and contrast sites (TRUE).

method

Character, name of function used to solve. This can be 'glm.fit' (default), 'brglmFit' (from the brglm2 package), or another function.

out

Character. Indicates type of value returned. Values can be 'model' (default; return model with lowest AICc), 'models' (return a list of all models), and/or 'tuning' (return a data frame with AICc for each model). If more than one value is specified, then the output will be a list with elements named "model", "models", and/or "tuning". The models will appear in the list in same order as they appear in the tuning table (i.e., model with the lowest AICc first, second-lowest next, etc.). If just one value is specified, the output will be either an object of class MaxEnt, a list with objects of class MaxEnt, or a data frame.

tooBig

Numeric. Used to catch errors when fitting a model fit with the brglmFit function in the brglm2 package. In some cases fitted coefficients are unstable and tend toward very high values, even if training data is standardized. Models with such coefficients will be discarded if any one coefficient is > tooBig. Set equal to Inf to keep all models.

verbose

Logical. If TRUE then display intermediate results on the display device.

...

Arguments to pass to glm.

Examples

## Not run: 
library(brglm2)

### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

# GLM
gl <- trainGlm(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# GAM
ga <- trainGam(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# NS
ns <- trainNs(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# prediction rasters
mapGlm <- predict(clim, gl, type='response')
mapGam <- predict(clim, ga, type='response')
mapNs <- predict(clim, ga, type='response')

par(mfrow=c(1, 3))
plot(mapGlm, main='GLM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapGam, main='GAM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapNs, main='NS')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.