trainGlmDredge: Calibrate a generalized linear model (GLM)
In adamlilith/enmSdm: Tools for Modeling Niches and Distributions of Species

trainGlmDredge

R Documentation

Calibrate a generalized linear model (GLM)

Description

This is a pseudo-deprecated function to construct a GLM piece-by-piece by first calculating AICc for all models with univariate, quadratic, cubic, 2-way-interaction, and linear-by-quadratic terms. It then creates a "full" model with the highest-ranked uni/bivariate terms. Finally, it implements an all-subsets model selection routine using AICc. Its output is a table with AICc for all possible models (resulting from the "full" model) and/or the model of these with the lowest AICc. The procedure uses Firth's penalized likelihood to address issues related to separability, small sample size, and bias. This function uses dredge in the MuMIn package to cycle through all possible models.

Usage

trainGlmDredge(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "binomial",
  tooBig = 1e+07,
  construct = TRUE,
  select = TRUE,
  quadratic = TRUE,
  cubic = TRUE,
  interaction = TRUE,
  interQuad = TRUE,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  initialTerms = 10,
  w = TRUE,
  method = "glm.fit",
  out = "model",
  verbose = FALSE,
  ...
)

Arguments

`data`	Data frame. Must contain fields with same names as in `preds` object.
`resp`	Character or integer. Name or column index of response variable. Default is to use the first column in `data`.
`preds`	Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in `data`.
`family`	Name of family for data error structure (see `?family`). Default is to use the 'binomial' family.
`tooBig`	Numeric. Used to catch errors when fitting a model fit with the `brglmFit` function in the brglm2 package. In some cases fitted coefficients are unstable and tend toward very high values, even if training data is standardized. Models with such coefficients will be discarded if any one coefficient is `> tooBig`. Set equal to `Inf` to keep all models.
`construct`	Logical. If `TRUE` then construct model from individual terms entered in order from lowest to highest AICc up to limits set by `presPerTermInitial` or `initialTerms` is met. If `FALSE` then the "full" model consists of all terms allowed by `quadratic`, `cubic`, `interaction`, and `interQuad`.
`select`	Logical. If `TRUE` then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if any).
`quadratic`	Logical. Used only if `construct` is `TRUE`. If `TRUE` then include quadratic terms in model construction stage for non-factor predictors.
`cubic`	Logical. Used only if `construct` is `TRUE`. If `TRUE` then include cubic terms in model construction stage for non-factor predictors.
`interaction`	Logical. Used only if `construct` is `TRUE`. If `TRUE` then include 2-way interaction terms (including interactions between factor predictors).
`interQuad`	Logical. Used only if `construct` is `TRUE`. If `TRUE` then include all possible interactions of the form 'x * y^2' unless 'y' is a factor.
`presPerTermInitial`	Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is `construct` is `TRUE`.
`presPerTermFinal`	Positive integer. Minimum number of presence sites per term in initial starting model. Used only if `select` is `TRUE`.
`initialTerms`	Positive integer. Maximum number of terms to be used in an initial model. Used only if `construct` is `TRUE`. The maximum that can be handled by `dredge()` is 30, so if this number is >30 and `select` is `TRUE` then it is forced to 30 with a warning. Note that the number of coefficients for factors is not calculated correctly, so if the predictors contain factors then this number might have to be reduced even more.
`w`	Either logical in which case `TRUE` causes the total weight of presences to equal the total weight of absences (if `family='binomial'`) OR a numeric list of weights, one per row in `data` OR the name of the column in `data` that contains site weights. The default is to assign equal total weights to presences and contrast sites (`TRUE`).
`method`	Character, name of function used to solve. This can be `'glm.fit'` (default), `'brglmFit'` (from the brglm2 package), or another function.
`out`	Character. Indicates type of value returned. If `model` (default) then returns an object of class `brglm2`/`glm`. If `table` then just return the AICc table for each kind of model term used in model construction. If both then return a 2-item list with the best model and the AICc table.
`verbose`	Logical. If `TRUE` then display intermediate results on the display device.
`...`	Arguments to pass to `brstats::glm()` or `dredge()`.

Value

If out = 'model' this function returns an object of class glm. If out = 'table' this function returns a data frame with tuning parameters and AICc for each model tried. If out = c('model', 'table' then it returns a list object with the glm object and the data frame.

Examples

## Not run: 
set.seed(123)
x <- matrix(rnorm(n = 6*100), ncol = 6)
# true variables will be #1, #2, #5, and #6, plus
# the squares of #1 and #6, plus
# interaction between #1 and #6
# the cube of #5
imp <- c('x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x1_pow2', 'x6_pow2', 'x1_by_x6', 'x5_pow3')
betas <- c(5, 2, 0, 0, 1, -1, 8, 1, 2, -4)
names(betas) <- imp
y <- 0.5 + x %*% betas[1:6] + betas[7] * x[ , 1] +
betas[8] * x[ , 6] + betas[9] * x[ , 1] * x[ , 6] + betas[10] * x[ , 5]^3
y <- as.integer(y > 10)
x <- cbind(y, x)
x <- as.data.frame(x)
names(x) <- c('y', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6')
model <- trainGlmDredge(x, verbose=TRUE)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.