squeezy | R Documentation |
Estimate group-specific elastic net penalties and fit a linear or logistic regression model.
squeezy(Y, X, groupset, alpha = 1, model = NULL, X2 = NULL, Y2 = NULL, unpen = NULL, intrcpt = TRUE, method = c("ecpcEN", "MML", "MML.noDeriv", "CV"), fold = 10, compareMR = TRUE, selectAIC = FALSE, fit.ecpc = NULL, lambdas = NULL, lambdaglobal = NULL, lambdasinit = NULL, sigmasq = NULL, ecpcinit = TRUE, SANN = FALSE, minlam = 10^-3, standardise_Y = NULL, reCV = NULL, opt.sigma = NULL, resultsAICboth = FALSE, silent=FALSE)
Y |
Response data; n-dimensional vector (n: number of samples) for linear and logistic outcomes. |
X |
Observed data; (nxp)-dimensional matrix (p: number of covariates) with each row the observed high-dimensional feature vector of a sample. |
groupset |
Co-data group set; list with G groups. Each group is a vector containing the indices of the covariates in that group. |
alpha |
Elastic net penalty mixing parameter. |
model |
Type of model for the response; linear or logistic. |
X2 |
(optional) Independent observed data for which response is predicted. |
Y2 |
(optional) Independent response data to compare with predicted response. |
unpen |
Unpenalised covariates; vector with indices of covariates that should not be penalised. |
intrcpt |
Should an intercept be included? Included by default for linear and logistic, excluded for Cox for which the baseline hazard is estimated. |
method |
Which method should be used to estimate the group-specific penalties? Default MML. |
fold |
Number of folds used in inner cross-validation to estimate (initial) global ridge penalty lambda (if not given). |
compareMR |
TRUE/FALSE to fit the multi-ridge model and return results for comparison. |
selectAIC |
TRUE/FALSE to select the single-group model or multi-group model. |
fit.ecpc |
(optional) Model fit obtained by the function ecpc (from the ecpc R-package) |
lambdas |
(optional) Group-specific ridge penalty parameters. If given, these are transformed to elastic net penalties. |
lambdaglobal |
(optional) Global ridge penalty parameter used for initialising the optimisation. |
lambdasinit |
(optional) Group-specific ridge penalty parameters used for initialising the optimisation. |
sigmasq |
(linear model only) If given, noise level is fixed (Y~N(X*beta,sd=sqrt(sigmasq))). |
ecpcinit |
TRUE/FALSE for using group-specific ridge penalties as given in ‘fit.ecpc’ for initialising the optimisation. |
SANN |
('method'=MML.noDeriv only) TRUE/FALSE to use simulated annealing in optimisation of the ridge penalties. |
minlam |
Minimal value of group-specific ridge penalty used in the optimisation. |
standardise_Y |
TRUE/FALSE should Y be standardised? |
reCV |
TRUE/FALSE should the elastic net penalties be recalibrated by cross-validation of a global rescaling penalty? |
opt.sigma |
(linear model only) TRUE/FALSE to optimise sigmasq jointly with the ridge penalties. |
resultsAICboth |
(selectAIC=TRUE only) TRUE/FALSE to return results of both the single-group and multi-group model. |
silent |
Should output messages be suppressed (default FALSE)? |
betaApprox |
Estimated regression coefficients of the group-adaptive elastic net model; p-dimensional vector. |
a0Approx |
Estimated intercept of the group-adaptive elastic net model; scalar. |
lambdaApprox |
Estimated group penalty parameters of the group-adaptive elastic net model; G-dimensional vector. |
lambdapApprox |
Estimated elastic net penalty parameter of the group-adaptive elastic net model for all covariates; p-dimensional vector. |
tauMR |
Estimated group variances of the multi-ridge model; G-dimensional vector. |
lambdaMR |
Estimated group penalties of the multi-ridge model; G-dimensional vector. |
lambdaglobal |
Estimated global ridge penalty; scalar. Note: only optimised if selectAIC=TRUE or compareMR=TRUE, else the returned crude estimate is sufficient for initialisation of squeezy. |
sigmahat |
(linear model) Estimated sigma^2; scalar. |
MLinit |
Min log marginal likelihood value at initial group penalties; scalar. |
MLfinal |
Min log marginal likelihood value at estimated group penalties; scalar. |
alpha |
Value used for the elastic net mixing parameter alpha; scalar. |
glmnet.fit |
Fit of the ‘glmnet’ function to obtain the regression coefficients. |
If ‘compareMR’=TRUE, multi-ridge model is returned as well:
betaMR |
Estimated regression coefficients of the multi-ridge model; p-dimensional vector. |
a0MR |
Estimated intercept of the multi-ridge model; scalar. |
If independent test set ‘X2’ is given, predictions and MSE are returned:
YpredApprox |
Predictions for the test set of the estimated group-adaptive elastic net model. |
MSEApprox |
Mean squared error on the test set of the estimated group-adaptive elastic net model. |
YpredMR |
Predictions for the test set of the estimated group-adaptive multi-ridge model. |
MSEMR |
Mean squared error on the test set of the estimated group-adaptive multi-ridge model. |
If ‘selectAIC’=TRUE, the multi-group or single-group model with best AIC is selected. Results in ‘betaApprox’, ‘a0Approx’, ‘lambdaApprox’ contain those results of the best model. Summary results of both models are included as well:
AICmodels |
List with elements “multigroup" and “onegroup".- Each element is a list with results of the multi-group or single-group model, containing the group penalties (‘lambdas’), sigma^2 (‘sigmahat’, linear model only), and AIC (‘AIC’). If besides ‘selectAIC’=TRUE, also ‘resultsAICboth’=TRUE, the fit of both the single-group model and multi-group model as obtained with squeezy are returned (‘fit’). |
modelbestAIC |
Either “onegroup" or “multigroup" for the selected model. |
Mirrelijn M. van Nee, Tim van de Brug, Mark A. van de Wiel
Mirrelijn M. van Nee, Tim van de Brug, Mark A. van de Wiel, "Fast marginal likelihood estimation of penalties for group-adaptive elastic net", arXiv preprint, arXiv:2101.03875 (2021).
##################### # Simulate toy data # ##################### p<-100 #number of covariates n<-50 #sample size training data set n2<-100 #sample size test data set G<- 5 #number of groups taugrp <- rep(c(0.05,0.1,0.2,0.5,1),each=p/G) #ridge prior variance groupIndex <- rep(1:G,each=p/G) #groups for co-data groupset <- lapply(1:G,function(x){which(groupIndex==x)}) #group set with each element one group sigmasq <- 2 #linear regression noise lambda1 <- sqrt(taugrp/2) #corresponding lasso penalty #A Laplace(0,b) variate can also be generated as the difference of two i.i.d. #Exponential(1/b) random variables betas <- rexp(p, 1/lambda1) - rexp(p, 1/lambda1) #regression coefficients X <- matrix(rnorm(n*p),n,p) #simulate training data Y <- rnorm(n,X%*%betas,sd=sqrt(sigmasq)) X2 <- matrix(rnorm(n*p),n,p) #simulate test data Y2 <- rnorm(n,X2%*%betas,sd=sqrt(sigmasq)) ############### # Fit squeezy # ############### #may be fit directly.. res.squeezy <- squeezy(Y,X,groupset=groupset,Y2=Y2,X2=X2, model="linear",alpha=0.5) #..or with ecpc-fit as initialisation if(requireNamespace("ecpc")){ res.ecpc <- ecpc::ecpc(Y,X, #observed data and response to train model groupsets=list(groupset), #informative co-data group set Y2=Y2,X2=X2, #test data model="linear", hypershrinkage="none",postselection = FALSE) res.squeezy <- squeezy(Y,X, #observed data and response to train model groupset=groupset, #informative co-data group set Y2=Y2,X2=X2, #test data fit.ecpc = res.ecpc, #ecpc-fit for initial values model="linear", #type of model for the response alpha=0.5) #elastic net mixing parameter } summary(res.squeezy$betaApprox) #estimated elastic net regression coefficients summary(res.squeezy$betaMR) #estimated multi-ridge regression coefficients res.squeezy$lambdaApprox #estimated group elastic net penalties res.squeezy$tauMR #multi-ridge group variances res.squeezy$MSEApprox #MSE group-elastic net model res.squeezy$MSEMR #MSE group-ridge model #once fit, quickly find model fit for different values of alpha: res.squeezy2 <- squeezy(Y,X, #observed data and response to train model groupset=groupset, #informative co-data groupset Y2=Y2,X2=X2, #test data lambdas = res.squeezy$lambdaMR, #fix lambdas at multi-ridge estimate model="linear", #type of model for the response alpha=0.9) #elastic net mixing parameter #Select single-group model or multi-group model based on best mAIC res.squeezy <- squeezy(Y,X, #observed data and response to train model groupset=groupset, #informative co-data group set Y2=Y2,X2=X2, #test data fit.ecpc = res.ecpc, #ecpc-fit for initial values model="linear", #type of model for the response alpha=0.5, #elastic net mixing parameter selectAIC = TRUE,resultsAICboth = TRUE) res.squeezy$modelbestAIC #selected model res.squeezy$AICmodels$multigroup$fit$MSEApprox #MSE on test set of multi-group model res.squeezy$AICmodels$onegroup$fit$MSEApprox #MSE on test set of single-group model
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.