Fit a lasso regression and use the Bayesian Information Criterion (BIC)
to select a subset of selected covariates.
Can deal with very large sparse data matrices.
Intended for binary reponse only (option family = "binomial"
is forced).
Depends on the glmnet
and relax.glmnet
functions from the package glmnet
.
lasso_bic(x, y, maxp = 50, path = TRUE, betaPos = TRUE, ...)
x 
Input matrix, of dimension nobs x nvars. Each row is an observation
vector. Can be in sparse matrix format (inherit from class

y 
Binary response variable, numeric. 
maxp 
A limit on how many relaxed coefficients are allowed.
Default is 50, in 
path 
Since 
betaPos 
Should the covariates selected by the procedure be
positively associated with the outcome ? Default is 
... 
Other arguments that can be passed to 
For each tested penalisation parameter \lambda
, a standard version of the BIC
is implemented.
BIC_\lambda =  2 l_\lambda + df(\lambda) * ln (N)
where l_\lambda
is the loglikelihood of the nonpenalized multiple logistic
regression model that includes the set of covariates with a nonzero coefficient
in the penalised regression coefficient vector associated to \lambda
,
and df(\lambda)
is the number of covariates with a nonzero coefficient
in the penalised regression coefficient vector associated to \lambda
,
The optimal set of covariates according to this approach is the one associated with
the classical multiple logistic regression model which minimizes the BIC.
An object with S3 class "log.lasso"
.
beta 
Numeric vector of regression coefficients in the lasso.
In 
selected_variables 
Character vector, names of variable(s) selected with the
lassobic approach.
If 
Emeline Courtois
Maintainer: Emeline Courtois
emeline.courtois@inserm.fr
set.seed(15)
drugs < matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) < paste0("drugs",1:ncol(drugs))
ae < rbinom(100, 1, 0.3)
lb < lasso_bic(x = drugs, y = ae, maxp = 20)
