trainLars: Calibrate a least angle regression (LARS) model
In adamlilith/enmSdm: Tools for Modeling Niches and Distributions of Species

trainLars

R Documentation

Calibrate a least angle regression (LARS) model

Description

This function calculates the least angle regression (LARS) using possibly overlapping grouped covariates. The model is fit using cross validation (the cv.grpregOverlap function). The cross-validation is calculated across values of the alpha, which controls the degree of ridge penalty (alpha ~0 (bit not = 0) imposes the full ridge penalty and alpha) = 1 imposes no ridge penalty). Higher-order terms are constructed (e.g., quadratic, 2-way interaction, etc.) and fitted in a manner that respects marginality (i.e., all lower order terms will have non-zero coefficients if a high-order term is used).

Usage

trainLars(
  data,
  resp = 1,
  preds = 2:ncol(data),
  alphas = c(0.01, seq(0.1, 1, by = 0.1)),
  scale = TRUE,
  quadratic = TRUE,
  cubic = TRUE,
  interaction = TRUE,
  interQuad = TRUE,
  na.rm = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`data`	Data frame.
`resp`	Character or integer. Name or column index of response variable. Default is to use the first column in `data`.
`preds`	Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in `data`.
`alphas`	Numeric or numeric vector in the range `(0, 1]`. Degree of ridge penalty to impose (values close to 0 ==> full ridge penalty, while a value of 1 imposes no rifhe penalty).
`scale`	Logical. If `TRUE` then scale values in `data[ , preds]` are rescaled to have mean of 0 and standard deviation of 1.
`quadratic`	Logical. If `TRUE` then include quadratic terms in model construction stage for non-factor predictors. Quadratic columns will be named `<predictor name>_pow2`.
`cubic`	Logical. If TRUE then include cubic terms in model construction stage for non-factor predictors. Cubic columns will be named `<predictor name>_pow3`.
`interaction`	Logical. If `TRUE` then include 2-way interaction terms (including interactions between factor predictors). Interaction columns will be named `<predictor 1 name>_by_<predictor 2 name>`.
`interQuad`	Logical. If TRUE then include all possible interactions of the form `x * y^2` unless `y` is a factor (linear-by-quadratic features). Linear-by-quadratic columns will be named `<predictor 1 name>_by_<predictor 2 name>_pow2`.
`na.rm`	Logical. If `TRUE` then remove all rows of `data` in which there is at least one `NA` among `resp` or `preds`. The default is `FALSE`, which will cause an error if any row has an `NA`.
`verbose`	Logical. If `TRUE` then display progress.
`...`	Arguments to pass to `grpreg` `grpregOverlap`, and `cv.grpregOverlap`, especially `family` and `penalty`. Do not include the `'group'` argument or `alpha` arguments.

Details

If scale is TRUE then predictors with zero variance will be removed from the data before the model is trained.

Value

Object of class grpreg and grpregOverlap.

Examples

## Not run: 
### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

al <- c(0.01, 0.5, 1)
fit1 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=FALSE, cubic=FALSE, interaction=FALSE,
   interQuad=FALSE, verbose=TRUE)
fit2 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=TRUE, cubic=FALSE, interaction=FALSE,
   interQuad=FALSE, verbose=TRUE)
fit3 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=TRUE, cubic=TRUE, interaction=TRUE,
   interQuad=TRUE, verbose=TRUE)

summary(fit1)
summary(fit2)
summary(fit3)

# predictions using all variables
pred1 <- predictLars(fit1, data, type='response')
pred2 <- predictLars(fit2, data, type='response')
pred3 <- predictLars(fit3, data, type='response')

# partial predictions examining effect of just x1 (plus any interactions)
pred1bio1 <- predictLars(fit1, data, type='response', preds='bio1')
pred2bio1 <- predictLars(fit2, data, type='response', preds='bio1')
pred3bio1 <- predictLars(fit3, data, type='response', preds='bio1')

par(mfrow=c(3, 3))
xlim <- c(0, 1)
breaks <- seq(0, 1, by=0.1)
plot(data$bio1, pred1bio1, ylim=c(0, 1))
points(data$bio1, pred2bio1, col='blue')
points(data$bio1, pred3bio1, col='red')
legend('topright', pch=1, col=c('black', 'blue', 'red'),
legend=c('linear-only', 'linear + quadratic', 'all terms'))

# predictions using just bio1 and bio12
pred3bio1_12 <- predictLars(fit3, data, type='response', preds=c('bio1', 'bio12'))
plot(pred3, pred3bio1_12)
abline(0, 1)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.