cv.cglasso: cross-validation for conditional group lasso method
In smog: Structural Modeling by using Overlapped Group Penalty

Description Usage Arguments Details Author(s) References See Also Examples

This function uses the cross-validation approach to search the optimal group tuning paramete λ_1, conditional on fixing λ_2 and λ_3 at a small value.

cv.cglasso(x, y, g, v, label, family = "gaussian", lambda.max = NULL,
  nlambda.max = 50, delta = 0.9, nfolds = 10, ratio = 0.01,
  parallel = FALSE, ncores = NULL, ...)

## S3 method for class 'cv.cglasso'
print(x, ...)

`x`	a model matrix, or a data frame of dimensions n by p, in which the columns represents the predictor variables.
`y`	response variable, corresponds to the family description. When family is ”gaussian” or ”binomial”, `y` ought to be a numeric vector of observations of length n; when family is ”coxph”, `y` represents the survival objects, containing the survival time and the censoring status. See `Surv`.
`g`	a vector of group labels for the predictor variables.
`v`	a vector of binary values, represents whether or not the predictor variables are penalized. Note that 1 indicates penalization and 0 for not penalization.
`label`	a character vector, represents the type of predictors in terms of treatment, prognostic, and predictive effects by using ”t”, ”prog”, and ”pred”, respectively.
`family`	a description of the distribution family for the response variable variable. For continuous response variable, family is ”gaussian”; for multinomial or binary response variable, family is ”binomial”; for survival response variable, family is ”coxph”, respectively.
`lambda.max`	the maximum value for lambda's. If `NULL`, the default `lambda.max` is 1/λ_{min}(x'x).
`nlambda.max`	the maximum number of lambdas' shrunk down from the maximum lambda `lambda.max`. Default is 20.
`delta`	the damping rate for lambda's such that λ_k = δ^kλ_0. Default is 0.9.
`nfolds`	number of folds. One fold of the observations in the data are used as the testing, and the remaining are fitted for model training. Default is 5.
`ratio`	The ratio of λ_1 and λ_2 to `lambda.max`. Smaller value means less penalty on the coefficients of interactions.
`parallel`	Whether or not process the `nfolds` cross-validations in parallel. If `TRUE`, use `foreach` to do each cross-validation in parallel. Default is `FALSE`.
`ncores`	number of cpu's for parallel computing. See `makeCluster` and `registerDoParallel`. Default is `NULL`.
`...`	other arguments that can be supplied to `smog`.

The idea of this conditional group lasso function is to reduce the computing time, by merely searching the optimal group penalty rather than searching a grid of two-dimensional penalties. By controling the ridge and interaction penalties at a small value, it still honors the hierarchy structure, but also leverage the multicolliearity problems.

Chong Ma, chongma8903@gmail.com.

\insertRef

ma2019structuralsmog

smog.default, smog.formula, cv.smog

# generate design matrix x
set.seed(2018)
n=50;p=20
s=10
x=matrix(0,n,1+2*p)
x[,1]=sample(c(0,1),n,replace = TRUE)
x[,seq(2,1+2*p,2)]=matrix(rnorm(n*p),n,p)
x[,seq(3,1+2*p,2)]=x[,seq(2,1+2*p,2)]*x[,1]

g=c(p+1,rep(1:p,rep(2,p)))  # groups 
v=c(0,rep(1,2*p))           # penalization status
label=c("t",rep(c("prog","pred"),p))  # type of predictor variables

# generate beta
beta=c(rnorm(13,0,2),rep(0,ncol(x)-13))
beta[c(2,4,7,9)]=0

# generate y
data=x%*%beta
noise=rnorm(n)
snr=as.numeric(sqrt(var(data)/(s*var(noise))))
y=data+snr*noise

cvfit=cv.cglasso(x,y,g,v,label,family="gaussian",nlambda.max = 20)