cv.cglasso: cross-validation for conditional group lasso method

Description Usage Arguments Details Author(s) References See Also Examples

View source: R/cv_cglasso.R

Description

This function uses the cross-validation approach to search the optimal group tuning paramete λ_1, conditional on fixing λ_2 and λ_3 at a small value.

Usage

1
2
3
4
5
6
cv.cglasso(x, y, g, v, label, family = "gaussian", lambda.max = NULL,
  nlambda.max = 50, delta = 0.9, nfolds = 10, ratio = 0.01,
  parallel = FALSE, ncores = NULL, ...)

## S3 method for class 'cv.cglasso'
print(x, ...)

Arguments

x

a model matrix, or a data frame of dimensions n by p, in which the columns represents the predictor variables.

y

response variable, corresponds to the family description. When family is ”gaussian” or ”binomial”, y ought to be a numeric vector of observations of length n; when family is ”coxph”, y represents the survival objects, containing the survival time and the censoring status. See Surv.

g

a vector of group labels for the predictor variables.

v

a vector of binary values, represents whether or not the predictor variables are penalized. Note that 1 indicates penalization and 0 for not penalization.

label

a character vector, represents the type of predictors in terms of treatment, prognostic, and predictive effects by using ”t”, ”prog”, and ”pred”, respectively.

family

a description of the distribution family for the response variable variable. For continuous response variable, family is ”gaussian”; for multinomial or binary response variable, family is ”binomial”; for survival response variable, family is ”coxph”, respectively.

lambda.max

the maximum value for lambda's. If NULL, the default lambda.max is 1/λ_{min}(x'x).

nlambda.max

the maximum number of lambdas' shrunk down from the maximum lambda lambda.max. Default is 20.

delta

the damping rate for lambda's such that λ_k = δ^kλ_0. Default is 0.9.

nfolds

number of folds. One fold of the observations in the data are used as the testing, and the remaining are fitted for model training. Default is 5.

ratio

The ratio of λ_1 and λ_2 to lambda.max. Smaller value means less penalty on the coefficients of interactions.

parallel

Whether or not process the nfolds cross-validations in parallel. If TRUE, use foreach to do each cross-validation in parallel. Default is FALSE.

ncores

number of cpu's for parallel computing. See makeCluster and registerDoParallel. Default is NULL.

...

other arguments that can be supplied to smog.

Details

The idea of this conditional group lasso function is to reduce the computing time, by merely searching the optimal group penalty rather than searching a grid of two-dimensional penalties. By controling the ridge and interaction penalties at a small value, it still honors the hierarchy structure, but also leverage the multicolliearity problems.

Author(s)

Chong Ma, chongma8903@gmail.com.

References

\insertRef

ma2019structuralsmog

See Also

smog.default, smog.formula, cv.smog

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# generate design matrix x
set.seed(2018)
n=50;p=20
s=10
x=matrix(0,n,1+2*p)
x[,1]=sample(c(0,1),n,replace = TRUE)
x[,seq(2,1+2*p,2)]=matrix(rnorm(n*p),n,p)
x[,seq(3,1+2*p,2)]=x[,seq(2,1+2*p,2)]*x[,1]

g=c(p+1,rep(1:p,rep(2,p)))  # groups 
v=c(0,rep(1,2*p))           # penalization status
label=c("t",rep(c("prog","pred"),p))  # type of predictor variables

# generate beta
beta=c(rnorm(13,0,2),rep(0,ncol(x)-13))
beta[c(2,4,7,9)]=0

# generate y
data=x%*%beta
noise=rnorm(n)
snr=as.numeric(sqrt(var(data)/(s*var(noise))))
y=data+snr*noise

cvfit=cv.cglasso(x,y,g,v,label,family="gaussian",nlambda.max = 20)

smog documentation built on Aug. 10, 2020, 5:07 p.m.