trainNs: Calibrate a natural splines model

View source: R/trainNs.r

trainNsR Documentation

Calibrate a natural splines model

Description

This function constructs a natural-spline model piece-by-piece by first calculating AICc for all models with univariate and bivariate (interaction) terms. It then creates a "full" model with the highest-ranked uni/bivariate terms then implements an all-subsets model selection routine.

Usage

trainNs(
  data,
  resp = names(data)[1],
  preds = names(data)[2:ncol(data)],
  family = "binomial",
  df = 1:4,
  construct = TRUE,
  select = TRUE,
  presPerTermInitial = 10,
  presPerTermFinal = 10,
  initialTerms = 8,
  w = TRUE,
  out = "model",
  verbose = FALSE,
  ...
)

Arguments

data

Data frame. Must contain fields with same names as in preds object.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

family

Name of family for data error structure (see family).

df

Integer > 0 or vector of integers > 0. Sets flexibility of model fit. See documentation for ns. If construct is TRUE, then univariate models for each term will be evaluated using each value in df. Note that NULL is also valid, but it can create problems when used with other functions in this package (and usually defaults to df=3 anyway).

construct

Logical. If TRUE then construct model by computing AICc for all univariate and bivariate models. Then add terms up to maximum set by presPerTermInitial and initialTerms.

select

Logical. If TRUE then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed after model construction (if any).

presPerTermInitial

Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is construct is TRUE.

presPerTermFinal

Positive integer. Minimum number of presence sites per term in initial starting model; used only if select is TRUE.

initialTerms

Positive integer. Maximum number of terms to be used in an initial model. Used only if construct is TRUE. The maximum that can be handled by dredge is 31, so if this number is >31 and select is TRUE then it is forced to 31 with a warning. Note that the number of coefficients for factors is not calculated correctly, so if the predictors contain factors then this number might have to be reduced even more.

w

Either logical in which case TRUE causes the total weight of presences to equal the total weight of absences (if family='binomial') OR a numeric list of weights, one per row in data OR the name of the column in data that contains site weights. The default is to assign a weight of 1 to each datum.

out

Character or character vector. Indicates type of value returned. Values can be 'model' (default; return model with lowest AICc), 'models' (return a list of all models), and/or 'tuning' (return a data frame with AICc for each model). If more than one value is specified, then the output will be a list with elements named "model", "models", and/or "tuning". If 'models' is specified, they will only be produced if select = TRUE. The models will appear in the list in same order as they appear in the tuning table (i.e., model with the lowest AICc first, second-lowest next, etc.). If just one value is specified, the output will be either an object of class glm, a list with objects of class glm, or a data frame.

verbose

Logical. If TRUE then display intermediate results on the display device. Default is FALSE.

...

Arguments to send to gam() or dredge().

Value

If out = 'model' this function returns an object of class gam. If out = 'tuning' this function returns a data frame with tuning parameters and AICc for each model tried. If out = c('model', 'tuning' then it returns a list object with the gam object and the data frame.

See Also

ns, gam, trainGam

Examples

## Not run: 
library(brglm2)

### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

# GLM
gl <- trainGlm(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# GAM
ga <- trainGam(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# NS
ns <- trainNs(
	data = env,
	resp = 'presBg',
	preds = preds,
 verbose = TRUE
)

# prediction rasters
mapGlm <- predict(clim, gl, type='response')
mapGam <- predict(clim, ga, type='response')
mapNs <- predict(clim, ga, type='response')

par(mfrow=c(1, 3))
plot(mapGlm, main='GLM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapGam, main='GAM')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])
plot(mapNs, main='NS')
plot(mad0, add=TRUE)
points(occs[ , c('longitude', 'latitude')])

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.