compCL: Fit regularization path for log-contrast model of...
In jiji6454/compReg: Regression with Compositional Covariates

Description Usage Arguments Details Value Author(s) References See Also Examples

Fit regression with compositional predictors via penalized log-contrast model which was proposed by Lin et al. (2014) <doi:10.1093/biomet/asu031>. The model estimation is conducted by minimizing a linearly constrained lasso criterion. The regularization paths are computed at a grid of tuning parameter lambda.

compCL(y, Z, Zc = NULL, intercept = TRUE,
       lam = NULL, nlam = 100, lambda.factor = ifelse(n < p, 0.05, 0.001),
       pf = rep(1, times = p), dfmax = p, pfmax = min(dfmax * 1.5, p),
       u = 1, mu_ratio = 1.01, tol = 1e-10,
       inner_maxiter = 1e+4, inner_eps = 1e-6,
       outer_maxiter = 1e+08, outer_eps = 1e-8)

`y`	a response vector with length n.
`Z`	a np* design matrix of compositional data or categorical data. If `Z` is categorical data, i.e., row-sums of `Z` differ from 1, the program automatically transforms `Z` into compositional data by dividing each row by its sum. `Z` could NOT include entry of 0's.
`Zc`	a np_c* design matrix of control variables (not penalized). Default is `NULL`.
`intercept`	Boolean, specifying whether to include an intercept. Default is `FALSE`.
`lam`	a user supplied lambda sequence. If `lam` is provided as a scaler and `nlam`>1, `lam` sequence is created starting from `lam`. To run a single value of `lam`, set `nlam`=1. The program will sort user-defined `lambda` sequence in decreasing order.
`nlam`	the length of the `lam` sequence. Default is 100. No effect if `lam` is provided.
`lambda.factor`	the factor for getting the minimal lambda in the `lam` sequence, where `min(lam)` = `lambda.factor` * `max(lam)`. `max(lam)` is the smallest value of `lam` for which all penalized coefficients become zero. If n >= p, the default is `0.001`. If n < p, the default is `0.05`.
`pf`	penalty factor, a vector of length p. Zero implies no shrinkage. Default value for each entry is 1.
`dfmax`	limit the maximum number of groups in the model. Useful for handling very large p, if a partial path is desired. Default is p.
`pfmax`	limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path, no matter how many times it re-enters the model through the path, it will be counted only once. Default is `min(dfmax*1.5, p)`.
`u`	the inital value of the penalty parameter of the augmented Lagrange method adopted in the outer loop. Default value is 1.
`mu_ratio`	the increasing ratio, with value at least 1, for `u`. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If `mu_ratio` < 1, the program automatically set `u` as 0 and `outer_maxiter` as 1, indicating that there is no linear constraints included.
`tol`	tolerance for the estimated coefficients to be considered as non-zero, i.e., if abs(β_j) < `tol`, set β_j as 0. Default value is 1e-10.
`inner_maxiter, inner_eps`	`inner_maxiter` is the maximun number of loops allowed in the coordinate descent; and `inner_eps` is the corresponding convergence tolerance.
`outer_maxiter, outer_eps`	`outer_maxiter` is the maximum number of loops allowed in the Augmented Lagrange method; and `outer_eps` is the corresponding convergence tolerance.

The log-contrast regression model with compositional predictors is expressed as

y = Zβ + e, s.t. ∑_{j=1}^{p}β_j=0,

where Z is the n-by-p design matrix of log-transforemd compositional data, β is the p-vector of regression cofficients, and e is an n-vector of random errors. If zero(s) exists in the original compositional data, user should pre-process these zero(s).
To enable variable selection, we conduct model estimation via linearly constrained lasso

argmin_{β}(\frac{1}{2n}\|y-Zβ\|_2^2 + λ\|β\|_1), s.t. ∑_{j=1}^{p}β_j= 0.

An object with S3 calss "compCL" is a list containing:

`beta`	a matrix of coefficients for p+p_c+1 rows. If `intercept=FALSE`, then the last row of `beta` is set to 0's.
`lam`	the sequence of `lam` values used.
`df`	the number of non-zero β_p's in estimated coefficients for `Z` at each value of `lam`.
`npass`	total iterations.
`error`	error messages. If 0, no error occurs.
`call`	the call that produces this object.
`dim`	dimension of the coefficient matrix `beta`.

Zhe Sun and Kun Chen

Lin, W., Shi, P., Peng, R. and Li, H. (2014) Variable selection in regression with compositional covariates, https://academic.oup.com/biomet/article/101/4/785/1775476. Biometrika 101 785-979

coef, predict, print and plot methods for "compCL" object and cv.compCL and GIC.compCL.

p = 30
n = 50
beta = c(1, -0.8, 0.6, 0, 0, -1.5, -0.5, 1.2)
beta = c(beta, rep(0, times = p - length(beta)))
Comp_data = comp_Model(n = n, p = p, beta = beta, intercept = FALSE)
m1 <- compCL(y = Comp_data$y, Z = Comp_data$X.comp,
             Zc = Comp_data$Zc, intercept = Comp_data$intercept)
print(m1)
plot(m1)
beta = coef(m1)
Test_data = comp_Model(n = 30, p = p, beta = Comp_data$beta, intercept = FALSE)
predmat = predict(m1, Znew = Test_data$X.comp, Zcnew = Test_data$Zc)