R-CMD-check

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)

Group Linear Algorithm with Sparse Principal decomposition

A variable selection and clustering method for generalized linear models

Overview

R package glasp is an extension of the Sparse Group Lasso that computes groups automatically. The internal supervised variable clustering algorithm is also an original contribution and integrates naturally within the Group Lasso penalty. Moreover, this implementation provides the flexibility to change the risk function and address any regression problem.

Install

To install glasp in R use

devtools::install_github("jlaria/glasp", dependencies = TRUE)

Example

Linear regression

In this example, we show how to integrate glasp with parsnip and other tidy libraries.

We first load the required libraries.

library(glasp)
library(parsnip)
library(yardstick)

Next, we simulate some linear data with the simulate_dummy_linear_data function.

set.seed(0)
data <- simulate_dummy_linear_data()

A glasp model can be computed using different approaches. This is the parsnip approach.

model <- glasp_regression(l1=0.01, l2=0.001, frob=0.0003) %>%
    set_engine("glasp") %>%
    fit(y~., data)

The coefficients can be accessed through the parsnip model object model$fit$beta.

print(model)

Now we generate some out-of-sample data to check how the model predicts.

set.seed(1)
new_data <- simulate_dummy_linear_data()

pred <- predict(model, new_data)
rmse <- rmse_vec(new_data$y, pred$.pred)

We obtain a r rmse root mean square error.

Hyper-parameter selection

Hyper-parameter search is quite easy using the tools glasp integrates with. To show this, we will simulate some survival data.

set.seed(0)
data <- simulate_dummy_surv_data()
head(data)

We create the glasp model, but this time we do not fit it. Instead, we will call the tune function.

library(tune)

model <- glasp_cox(l1 = tune(), 
                     l2 = tune(),
                     frob = tune(), 
                     num_comp = tune()) %>%
         set_engine("glasp")

We specify k-fold cross validation and Bayesian optimization to search for hyper-parameters.

library(rsample)

data_rs <- vfold_cv(data, v = 4)

hist <- tune_bayes(model, event~time+., # <- Notice the syntax with time in the right hand side 
                   resamples = data_rs,
                   metrics = metric_set(roc_auc), # yardstick's roc_auc
                   iter =10, # <- 10 iterations... change to 1000
                   control = control_bayes(verbose = TRUE)) # 

show_best(hist, metric = "roc_auc")

Logistic regression

library(glasp)
library(yardstick)

set.seed(0)
data <- simulate_dummy_logistic_data()
model <- logistic_regression(y~., data, l1=0.01, l2=0.001, frob=0.001, ncomp=2)
print(model)

pred = predict(model, data)
library(rsample)

set.seed(0)
data <- simulate_dummy_logistic_data()

model <- glasp_classification(l1 = tune(), 
                     l2 = tune(),
                     frob = tune(), 
                     num_comp = tune()) %>%
         set_engine("glasp")

data_rs <- vfold_cv(data, v = 4)

hist <- tune_grid(model, y~., 
                   resamples = data_rs,
                   metrics = metric_set(roc_auc), 
                   grid =10, 
                   control = control_grid(verbose = FALSE)) # 
show_best(hist, metric = "roc_auc")

Testing with docker and Visual Studio Code

To replicate the environment used in development, you need

  1. Visual Studio Code with the Remote-Containers extension.
  2. Docker Desktop or just docker and docker-compose.

After installing the requisites above,

  1. Clone this repository git clone https://https://github.com/jlaria/glasp.git
  2. In vscode, go to File/Open Folder and open the root folder of this project glasp.
  3. Then, using the Remote-Containers extension, Reopen in Container (or ctrl+p and type >Remote-Containers:Reopen in Container)
  4. Wait for everything to install and then go to http://localhost:8787 username:rstudio, password:rstudio.

That's it! You can use an isolated rstudio-server for development.

See inst/rstudio for more details about the environment.



jlaria/glasp documentation built on Dec. 5, 2022, 6:42 a.m.