biglasso
In biglasso: Extending Lasso Model Fitting to Big Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, fig.height=4
)

Small Data

Linear regression

library(biglasso)

data(colon)
X <- colon$X
y <- colon$y
dim(X)

X[1:5, 1:5]

## convert X to a big.matrix object
## X.bm is a pointer to the data matrix
X.bm <- as.big.matrix(X)
str(X.bm)

dim(X.bm)

X.bm[1:5, 1:5]
## same results as X[1:5, 1:5]

## fit entire solution path, using our newly proposed "Adaptive" screening rule (default)
fit <- biglasso(X.bm, y)
plot(fit)

Cross-Validation

## 10-fold cross-valiation in parallel
cvfit <- tryCatch(
         {
                cv.biglasso(X.bm, y, seed = 1234, nfolds = 10, ncores = 4)
         },
         error = function(cond) {
                cv.biglasso(X.bm, y, seed = 1234, nfolds = 10, ncores = 2)
         }
)

After cross-validation, a few things we can do:

plot the cross-validation plots:

par(mfrow = c(2, 2), mar = c(3.5, 3.5, 3, 1) ,mgp = c(2.5, 0.5, 0))
plot(cvfit, type = "all")

Summarize CV object:

summary(cvfit)

Extract non-zero coefficients at the optimal $\lambda$ value:

coef(cvfit)[which(coef(cvfit) != 0),]

Logistic Regression

data(Heart)
X <- Heart$X
y <- Heart$y
X.bm <- as.big.matrix(X)
fit <- biglasso(X.bm, y, family = "binomial")
plot(fit)

Cox Regression

library(survival)
X <- heart[,4:7]
y <- Surv(heart$stop - heart$start, heart$event)
X.bm <- as.big.matrix(X)
fit <- biglasso(X.bm, y, family = "cox")
plot(fit)

Multiple response Linear Regression (multi-task learning)

set.seed(10101)
n=300; p=300; m=5; s=10; b=1
x = matrix(rnorm(n * p), n, p)
beta = matrix(seq(from=-b,to=b,length.out=s*m),s,m)
y = x[,1:s] %*% beta + matrix(rnorm(n*m,0,1),n,m)
x.bm = as.big.matrix(x)
fit = biglasso(x.bm, y, family = "mgaussian")
plot(fit)

Big Data

When the raw data file is very large, it's better to convert the raw data file into a file-backed big.matrix by using a file cache. We can call function setupX, which reads the raw data file and creates a backing file (.bin) and a descriptor file (.desc) for the raw data matrix:

## The data has 200 observations, 600 features, and 10 non-zero coefficients.
## This is not actually very big, but vignettes in R are supposed to render
## quickly. Much larger data can be handled in the same way.
if(!file.exists('BigX.bin')) {
  X <- matrix(rnorm(1000 * 5000), 1000, 5000)
  beta <- c(-5:5)
  y <- as.numeric(X[,1:11] %*% beta)
  write.csv(X, "BigX.csv", row.names = F)
  write.csv(y, "y.csv", row.names = F)
  ## Pretend that the data in "BigX.csv" is too large to fit into memory
  X.bm <- setupX("BigX.csv", header = T)
}

It's important to note that the above operation is just one-time execution. Once done, the data can always be retrieved seamlessly by attaching its descriptor file (.desc) in any new R session:

rm(list = c("X", "X.bm", "y")) # Pretend starting a new session
X.bm <- attach.big.matrix("BigX.desc")
y <- read.csv("y.csv")[,1]

This is very appealing for big data analysis in that we don't need to "read" the raw data again in a R session, which would be very time-consuming. The code below again fits a lasso-penalized linear model, and runs 10-fold cross-validation:

system.time({fit <- biglasso(X.bm, y)})

plot(fit)

# 10-fold cross validation in parallel
tryCatch(
    {
        system.time({cvfit <- cv.biglasso(X.bm, y, seed = 1234, ncores = 4, nfolds = 10)})
    },
    error = function(cond) {
        system.time({cvfit <- cv.biglasso(X.bm, y, seed = 1234, ncores = 2, nfolds = 10)})
    }
)

par(mfrow = c(2, 2), mar = c(3.5, 3.5, 3, 1), mgp = c(2.5, 0.5, 0))
plot(cvfit, type = "all")

Links

Any scripts or data that you put into this service are public.

biglasso documentation built on April 4, 2025, 3:16 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

biglasso
Extending Lasso Model Fitting to Big Data

biglasso
In biglasso: Extending Lasso Model Fitting to Big Data

Small Data

Linear regression

Cross-Validation

Logistic Regression

Cox Regression

Multiple response Linear Regression (multi-task learning)

Big Data

Links

Try the biglasso package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

biglasso Extending Lasso Model Fitting to Big Data

biglasso In biglasso: Extending Lasso Model Fitting to Big Data

Small Data

Linear regression

Cross-Validation

Logistic Regression

Cox Regression

Multiple response Linear Regression (multi-task learning)

Big Data

Links

Try the biglasso package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

biglasso
Extending Lasso Model Fitting to Big Data

biglasso
In biglasso: Extending Lasso Model Fitting to Big Data