trainGlmDredge: Calibrate a generalized linear model (GLM)

View source: R/trainGlmDredge.r

trainGlmDredgeR Documentation

Calibrate a generalized linear model (GLM)

Description

This is a pseudo-deprecated function to construct a GLM piece-by-piece by first calculating AICc for all models with univariate, quadratic, cubic, 2-way-interaction, and linear-by-quadratic terms. It then creates a "full" model with the highest-ranked uni/bivariate terms. Finally, it implements an all-subsets model selection routine using AICc. Its output is a table with AICc for all possible models (resulting from the "full" model) and/or the model of these with the lowest AICc. The procedure uses Firth's penalized likelihood to address issues related to separability, small sample size, and bias. This function uses dredge in the MuMIn package to cycle through all possible models.

Usage

trainGlmDredge(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "binomial",
  tooBig = 1e+07,
  construct = TRUE,
  select = TRUE,
  quadratic = TRUE,
  cubic = TRUE,
  interaction = TRUE,
  interQuad = TRUE,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  initialTerms = 10,
  w = TRUE,
  method = "glm.fit",
  out = "model",
  verbose = FALSE,
  ...
)

Arguments

data

Data frame. Must contain fields with same names as in preds object.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

family

Name of family for data error structure (see ?family). Default is to use the 'binomial' family.

tooBig

Numeric. Used to catch errors when fitting a model fit with the brglmFit function in the brglm2 package. In some cases fitted coefficients are unstable and tend toward very high values, even if training data is standardized. Models with such coefficients will be discarded if any one coefficient is > tooBig. Set equal to Inf to keep all models.

construct

Logical. If TRUE then construct model from individual terms entered in order from lowest to highest AICc up to limits set by presPerTermInitial or initialTerms is met. If FALSE then the "full" model consists of all terms allowed by quadratic, cubic, interaction, and interQuad.

select

Logical. If TRUE then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if any).

quadratic

Logical. Used only if construct is TRUE. If TRUE then include quadratic terms in model construction stage for non-factor predictors.

cubic

Logical. Used only if construct is TRUE. If TRUE then include cubic terms in model construction stage for non-factor predictors.

interaction

Logical. Used only if construct is TRUE. If TRUE then include 2-way interaction terms (including interactions between factor predictors).

interQuad

Logical. Used only if construct is TRUE. If TRUE then include all possible interactions of the form 'x * y^2' unless 'y' is a factor.

presPerTermInitial

Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is construct is TRUE.

presPerTermFinal

Positive integer. Minimum number of presence sites per term in initial starting model. Used only if select is TRUE.

initialTerms

Positive integer. Maximum number of terms to be used in an initial model. Used only if construct is TRUE. The maximum that can be handled by dredge() is 30, so if this number is >30 and select is TRUE then it is forced to 30 with a warning. Note that the number of coefficients for factors is not calculated correctly, so if the predictors contain factors then this number might have to be reduced even more.

w

Either logical in which case TRUE causes the total weight of presences to equal the total weight of absences (if family='binomial') OR a numeric list of weights, one per row in data OR the name of the column in data that contains site weights. The default is to assign equal total weights to presences and contrast sites (TRUE).

method

Character, name of function used to solve. This can be 'glm.fit' (default), 'brglmFit' (from the brglm2 package), or another function.

out

Character. Indicates type of value returned. If model (default) then returns an object of class brglm2/glm. If table then just return the AICc table for each kind of model term used in model construction. If both then return a 2-item list with the best model and the AICc table.

verbose

Logical. If TRUE then display intermediate results on the display device.

...

Arguments to pass to brstats::glm() or dredge().

Value

If out = 'model' this function returns an object of class glm. If out = 'table' this function returns a data frame with tuning parameters and AICc for each model tried. If out = c('model', 'table' then it returns a list object with the glm object and the data frame.

See Also

trainGlm, glm in the stats package, brglmFit in the brglm2 package

Examples

## Not run: 
set.seed(123)
x <- matrix(rnorm(n = 6*100), ncol = 6)
# true variables will be #1, #2, #5, and #6, plus
# the squares of #1 and #6, plus
# interaction between #1 and #6
# the cube of #5
imp <- c('x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x1_pow2', 'x6_pow2', 'x1_by_x6', 'x5_pow3')
betas <- c(5, 2, 0, 0, 1, -1, 8, 1, 2, -4)
names(betas) <- imp
y <- 0.5 + x %*% betas[1:6] + betas[7] * x[ , 1] +
betas[8] * x[ , 6] + betas[9] * x[ , 1] * x[ , 6] + betas[10] * x[ , 5]^3
y <- as.integer(y > 10)
x <- cbind(y, x)
x <- as.data.frame(x)
names(x) <- c('y', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6')
model <- trainGlmDredge(x, verbose=TRUE)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.