lasso_vars: Most Relevant Features Using Lasso Regression

View source: R/lasso.R

lasso_varsR Documentation

Most Relevant Features Using Lasso Regression

Description

Use Lasso regression to identify the most relevant variables that can predict/identify another variable. You might want to compare with corr_var() and/or x2y() results to compliment the analysis No need to standardize, center or scale your data. Tidyverse friendly.

Usage

lasso_vars(
  df,
  variable,
  ignore = NULL,
  nlambdas = 100,
  nfolds = 10,
  top = 20,
  quiet = FALSE,
  seed = 123,
  ...
)

Arguments

df

Dataframe. Any dataframe is valid as ohse will be applied to process categorical values, and values will be standardize automatically.

variable

Variable. Dependent variable or response.

ignore

Character vector. Variables to exclude from study.

nlambdas

Integer. Number of lambdas to be used in a search.

nfolds

Integer. Number of folds for K-fold cross-validation (>= 2).

top

Integer. Plot top n results only.

quiet

Boolean. Keep quiet? Else, show messages.

seed

Numeric.

...

Additional parameters passed to ohse().

Value

List. Contains lasso model coefficients, performance metrics, the actual model fitted and a plot.

See Also

Other Machine Learning: ROC(), conf_mat(), export_results(), gain_lift(), h2o_automl(), h2o_predict_MOJO(), h2o_selectmodel(), impute(), iter_seeds(), model_metrics(), model_preprocess(), msplit()

Other Exploratory: corr_cross(), corr_var(), crosstab(), df_str(), distr(), freqs(), freqs_df(), freqs_list(), freqs_plot(), missingness(), plot_cats(), plot_df(), plot_nums(), tree_var()

Examples

## Not run: 
# CRAN
Sys.unsetenv("LARES_FONT") # Temporal
data(dft) # Titanic dataset

m <- lasso_vars(dft, Survived, ignore = c("Cabin"))
print(m$coef)
print(m$metrics)
plot(m$plot)

## End(Not run)

laresbernardo/lares documentation built on Jan. 14, 2025, 2:22 a.m.