bolasso: Bootstrapped LASSO (Bolasso) for regression and...

Description Usage Arguments Details Value Examples

View source: R/bolasso.R

Description

bolasso implements Bootstrapped LASSO regression and classification.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
bolasso(
  train_df,
  nboot,
  formula,
  family,
  test_df = NULL,
  predict_df = NULL,
  free_vars = NULL,
  nfold = NULL,
  lambda = "min",
  sparsity_threshold = NULL,
  selection_threshold = 0.9,
  verbose = FALSE,
  ...
)

Arguments

train_df

An input dataframe with y and X.

nboot

A numeric value specifying the number of bootstrap replicates.

formula

A formula for the covariates.

family

A string that specifies either 'gaussian' or 'binomial'.

test_df

A dataframe containing the same columns as train_df. The training set.

predict_df

A dataframe matching train_df. This is to generate predictions using the trained & tested model. This argument is optional.

free_vars

A string or character vector specifying which covariate(s) to never penalize. This argument is optional.

nfold

The number of cross-validation folds. Only specify if cross-validation is desired. This argument is optional.

lambda

A string specifying which lambda to use for prediction when utilizing cross-validation. Typically either "min" or "1se". Default value is "min".

sparsity_threshold

A numeric value in [0, 1]. Any variable with a percentage of sparsity greater than this value will be dropped. This argument is optional.

selection_threshold

A numeric value in [0, 1]. Variables must be selected in at least this percentage of bootstrap replicates. Default value is 0.9.

verbose

Logical indicating whether to return progress statements. Default is FALSE.

...

Generic argument to which you can pass any other valid gamlr argument, such as standardize = FALSE.

Details

The bolasso function bootstraps LASSO regression, as found in the gamlr package, across n bootstrap partitions. This achieves more robust variable selection than standard LASSO and improves predictive accuracy. It handles standard OLS regression and binomial logistic regression.

Value

A list containing the LASSO model, predicted_values, residuals, and selected variables.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
idx <- train_test_validate(iris$Sepal.Length, train.p = .6, test.p = .2)

initialize_parallel()

bolasso_model <- bolasso(train_df = iris[idx$train, ],
                         nboot =100,
                         formula = Sepal.Length ~ .,
                         family = "gaussian",
                         test_df = iris[idx$test, ],
                         predict_df = iris[idx$validate, ],
                         nfold = 5,
                         verbose = TRUE)

## End(Not run)

dmolitor/umbrella documentation built on Nov. 10, 2020, 1:25 a.m.