trainLars: Calibrate a least angle regression (LARS) model

View source: R/trainLars.r

trainLarsR Documentation

Calibrate a least angle regression (LARS) model

Description

This function calculates the least angle regression (LARS) using possibly overlapping grouped covariates. The model is fit using cross validation (the cv.grpregOverlap function). The cross-validation is calculated across values of the alpha, which controls the degree of ridge penalty (alpha ~0 (bit not = 0) imposes the full ridge penalty and alpha) = 1 imposes no ridge penalty). Higher-order terms are constructed (e.g., quadratic, 2-way interaction, etc.) and fitted in a manner that respects marginality (i.e., all lower order terms will have non-zero coefficients if a high-order term is used).

Usage

trainLars(
  data,
  resp = 1,
  preds = 2:ncol(data),
  alphas = c(0.01, seq(0.1, 1, by = 0.1)),
  scale = TRUE,
  quadratic = TRUE,
  cubic = TRUE,
  interaction = TRUE,
  interQuad = TRUE,
  na.rm = FALSE,
  verbose = FALSE,
  ...
)

Arguments

data

Data frame.

resp

Character or integer. Name or column index of response variable. Default is to use the first column in data.

preds

Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in data.

alphas

Numeric or numeric vector in the range (0, 1]. Degree of ridge penalty to impose (values close to 0 ==> full ridge penalty, while a value of 1 imposes no rifhe penalty).

scale

Logical. If TRUE then scale values in data[ , preds] are rescaled to have mean of 0 and standard deviation of 1.

quadratic

Logical. If TRUE then include quadratic terms in model construction stage for non-factor predictors. Quadratic columns will be named <predictor name>_pow2.

cubic

Logical. If TRUE then include cubic terms in model construction stage for non-factor predictors. Cubic columns will be named <predictor name>_pow3.

interaction

Logical. If TRUE then include 2-way interaction terms (including interactions between factor predictors). Interaction columns will be named <predictor 1 name>_by_<predictor 2 name>.

interQuad

Logical. If TRUE then include all possible interactions of the form x * y^2 unless y is a factor (linear-by-quadratic features). Linear-by-quadratic columns will be named <predictor 1 name>_by_<predictor 2 name>_pow2.

na.rm

Logical. If TRUE then remove all rows of data in which there is at least one NA among resp or preds. The default is FALSE, which will cause an error if any row has an NA.

verbose

Logical. If TRUE then display progress.

...

Arguments to pass to grpreg grpregOverlap, and cv.grpregOverlap, especially family and penalty. Do not include the 'group' argument or alpha arguments.

Details

If scale is TRUE then predictors with zero variance will be removed from the data before the model is trained.

Value

Object of class grpreg and grpregOverlap.

See Also

predictLars, grpreg, grpregOverlap, cv.grpregOverlap

Examples

## Not run: 
### model red-bellied lemurs
data(mad0)
data(lemurs)

# climate data
bios <- c(1, 5, 12, 15)
clim <- raster::getData('worldclim', var='bio', res=10)
clim <- raster::subset(clim, bios)
clim <- raster::crop(clim, mad0)

# occurrence data
occs <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
occsEnv <- raster::extract(clim, occs[ , c('longitude', 'latitude')])

# background sites
bg <- 2000 # too few cells to locate 10000 background points
bgSites <- dismo::randomPoints(clim, 2000)
bgEnv <- raster::extract(clim, bgSites)

# collate
presBg <- rep(c(1, 0), c(nrow(occs), nrow(bgSites)))
env <- rbind(occsEnv, bgEnv)
env <- cbind(presBg, env)
env <- as.data.frame(env)

preds <- paste0('bio', bios)

al <- c(0.01, 0.5, 1)
fit1 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=FALSE, cubic=FALSE, interaction=FALSE,
   interQuad=FALSE, verbose=TRUE)
fit2 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=TRUE, cubic=FALSE, interaction=FALSE,
   interQuad=FALSE, verbose=TRUE)
fit3 <- trainLars(data=data, penalty='cMCP', family='binomial',
   nfolds=3, alphas=al, quadratic=TRUE, cubic=TRUE, interaction=TRUE,
   interQuad=TRUE, verbose=TRUE)

summary(fit1)
summary(fit2)
summary(fit3)

# predictions using all variables
pred1 <- predictLars(fit1, data, type='response')
pred2 <- predictLars(fit2, data, type='response')
pred3 <- predictLars(fit3, data, type='response')

# partial predictions examining effect of just x1 (plus any interactions)
pred1bio1 <- predictLars(fit1, data, type='response', preds='bio1')
pred2bio1 <- predictLars(fit2, data, type='response', preds='bio1')
pred3bio1 <- predictLars(fit3, data, type='response', preds='bio1')

par(mfrow=c(3, 3))
xlim <- c(0, 1)
breaks <- seq(0, 1, by=0.1)
plot(data$bio1, pred1bio1, ylim=c(0, 1))
points(data$bio1, pred2bio1, col='blue')
points(data$bio1, pred3bio1, col='red')
legend('topright', pch=1, col=c('black', 'blue', 'red'),
legend=c('linear-only', 'linear + quadratic', 'all terms'))

# predictions using just bio1 and bio12
pred3bio1_12 <- predictLars(fit3, data, type='response', preds=c('bio1', 'bio12'))
plot(pred3, pred3bio1_12)
abline(0, 1)

## End(Not run)

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.