run_lasso: Apply lasso classifier to MrP.
In autoMrP: Improving MrP with Ensemble Learning

run_lasso

R Documentation

Apply lasso classifier to MrP.

Description

run_lasso is a wrapper function that applies the lasso classifier to data provided by the user, evaluates prediction performance, and chooses the best-performing model.

Usage

run_lasso(
  y,
  L1.x,
  L2.x,
  L2.unit,
  L2.reg,
  n.iter,
  loss.unit,
  loss.fun,
  lambda,
  data,
  verbose,
  cores
)

Arguments

`y`	Outcome variable. A character vector containing the column names of the outcome variable. A character scalar containing the column name of the outcome variable in `survey`.
`L1.x`	Individual-level covariates. A character vector containing the column names of the individual-level variables in `survey` and `census` used to predict outcome `y`. Note that geographic unit is specified in argument `L2.unit`.
`L2.x`	Context-level covariates. A character vector containing the column names of the context-level variables in `survey` and `census` used to predict outcome `y`. To exclude context-level variables, set `L2.x = NULL`.
`L2.unit`	Geographic unit. A character scalar containing the column name of the geographic unit in `survey` and `census` at which outcomes should be aggregated.
`L2.reg`	Geographic region. A character scalar containing the column name of the geographic region in `survey` and `census` by which geographic units are grouped (`L2.unit` must be nested within `L2.reg`). Default is `NULL`.
`n.iter`	Lasso number of lambda values. An integer-valued scalar specifying the number of lambda values to search over. Default is `100`. Note: Is ignored if a vector of `lasso.lambda` values is provided.
`loss.unit`	Loss function unit. A character-valued scalar indicating whether performance loss should be evaluated at the level of individual respondents (`individuals`), geographic units (`L2 units`) or at both levels. Default is `c("individuals", "L2 units")`. With multiple loss units, parameters are ranked for each loss unit and the loss unit with the lowest rank sum is chosen. Ties are broken according to the order in the search grid.
`loss.fun`	Loss function. A character-valued scalar indicating whether prediction loss should be measured by the mean squared error (`MSE`), the mean absolute error (`MAE`), binary cross-entropy (`cross-entropy`), mean squared false error (`msfe`), the f1 score (`f1`), or a combination thereof. Default is `c("MSE", "cross-entropy","msfe", "f1")`. With multiple loss functions, parameters are ranked for each loss function and the parameter combination with the lowest rank sum is chosen. Ties are broken according to the order in the search grid.
`lambda`	Lasso penalty parameter. A numeric `vector` of non-negative values. The penalty parameter controls the shrinkage of the context-level variables in the lasso model. Default is a sequence with minimum 0.1 and maximum 250 that is equally spaced on the log-scale. The number of values is controlled by the `lasso.n.iter` parameter.
`data`	Data for cross-validation. A `list` of `k` `data.frames`, one for each fold to be used in `k`-fold cross-validation.
`verbose`	Verbose output. A logical argument indicating whether or not verbose output should be printed. Default is `FALSE`.
`cores`	The number of cores to be used. An integer indicating the number of processor cores used for parallel computing. Default is 1.