lassoBinomial: Perform LASSO regularized logistic regressions to model the...

Description Usage Arguments Details Value See Also

View source: R/lassoBinomial.R

Description

lassoBinomial runs cross-validated regularized regressions for a single species and returns important coefficients

Usage

1
2
3
4
lassoBinomial(response, covariates, cutoff, n_reps)

lassoBinomial_comm(outcome_data, count_data, outcome_indices, covariates,
  cutoff, n_reps, n_cores)

Arguments

response

A vector of binary presence-absence observations for the focal species

covariates

A matrix of covariates, with nrow(covariates) == nrow(response)

cutoff

Positive numeric value representing the proportion of models in which a predictor must be retained in order to be treated as meaningful. If the predictor is retained in fewer than cutoff proportion of cv.glmnet regularized models, its coefficient is forced to be zero. If the predictor is retained in at least cutoff proportion of models, its mean coefficient is returned. Default is 0.80

n_reps

Positive integer representing the number of times to repeat 10-fold cv.glmnet regularized regressions (default is 10)

outcome_data

A dataframe containing binary presence-absence observations for species (each column representing a different species)

count_data

A dataframe containing count observations for species (each column representing a different species)

outcome_indices

A sequence of positive integers representing the column indices in outcome_data that are to be modelled as binary outcome variables (i.e. species occurrence observations). Each one of these columns will be treated as a separate species whose occurrence probability is to be modelled using lassoBinomial. Default is to run models for all columns in outcome_data

n_cores

Positive integer stating the number of processing cores to split the job across. Default is parallel::detect_cores() - 1

Details

Regularized regressions are performed to identify meaningful predictors of the species' occurrence probability using cv.glmnet. These models use coordinated gradient descent, applied to training sets of the data, to identify regression parameters. These parameters are predicted on the remaining subset of the data (the test set) to assess model fit. The process is repeated until a best-fitting model is identified (minimising the loss function, which is cross-validated deviance in this case). By replicating the process n_reps times, we account for uncertainty in the fold generating process and can more confidently identify meaningful predictors (i.e. those that are retained in at least cutoff proportion of n_reps models)

Value

lassoBinomial returns a single vector of coefficients for predictors in covariates.

lassoBinomial_comm binds these vectors into a dataframe with rownames matching species names in outcome_data. It then returns a list containing coefficients and scaling factors, which are used in predictive functions

See Also

cv.glmnet


nicholasjclark/BBS.occurrences documentation built on July 19, 2020, 8:31 p.m.