GIC.compCL: Compute information crieteria for the 'compCL' model. In Compack: Regression with Compositional Covariates

Description

Tune the penalty parameter codelam in the compCGL model by GIC, BIC, or AIC. This function calculates the GIC, BIC, or AIC curve and returns the optimal value of lam.

Usage

 1 GIC.compCL(y, Z, Zc = NULL, intercept = FALSE, lam = NULL, ...) 

Arguments

 y a response vector with length n. Z a n*p design matrix of compositional data or categorical data. If Z is categorical data, i.e., row-sums of Z differ from 1, the program automatically transforms Z into compositional data by dividing each row by its sum. Z could NOT include entry of 0's. Zc a n*p_c design matrix of control variables (not penalized). Default is NULL. intercept Boolean, specifying whether to include an intercept. Default is FALSE. lam a user supplied lambda sequence. If lam is provided as a scaler and nlam>1, lam sequence is created starting from lam. To run a single value of lam, set nlam=1. The program will sort user-defined lambda sequence in decreasing order. ... other arguments that can be passed to compCL.

Details

The model estimation is conducted through minimizing the following criterion:

\frac{1}{2n}\|y-Zβ\|_2^2 + λ\|β\|_1, s.t. ∑_{j=1}^{p} β_j = 0.

The GIC is defined as:

GIC(λ) = \log{\hat{σ}^2(λ)} + (s(λ) -1) \log{(max(p, n))} * \log{(\log{n})} / n,

where \hat{σ}^2(λ) = \|y - Z\hat{β}(λ)\|_{2}^{2}/n, \hat{β}(λ) is the regularized estimator, and s(λ) is the number of nonzero coefficients in \hat{β}(λ). Because of the zero-sum constraint, the effective number of free parameters is s(λ) - 1 for s(λ) ≥ 2. The optimal λ is selected by minimizing GIC(λ).

Value

an object of S3 class GIC.compCL is returned, which is a list:

 compCL.fit a fitted compCL object. lam the sequence of lam. GIC a vector of GIC value(s). lam.min the lam value that minimizes GIC(λ).

References

Lin, W., Shi, P., Peng, R. and Li, H. (2014) Variable selection in regression with compositional covariates, https://academic.oup.com/biomet/article/101/4/785/1775476. Biometrika 101 785-979

Fan, Y., and Tang, C. Y. (2013) Tuning parameter selection in high dimensional penalized likelihood, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12001 Journal of the Royal Statistical Society. Series B 75 531-552

compCL and cv.compCL, and coef, predict and plot methods for "GIC.compCL" object.
  1 2 3 4 5 6 7 8 9 10 11 12 13 p = 30 n = 50 beta = c(1, -0.8, 0.6, 0, 0, -1.5, -0.5, 1.2) beta = c(beta, rep(0, times = p - length(beta))) Comp_data = comp_Model(n = n, p = p, beta = beta, intercept = FALSE) GICm1 <- GIC.compCL(y = Comp_data$y, Z = Comp_data$X.comp, Zc = Comp_data$Zc, intercept = Comp_data$intercept) coef(GICm1) plot(GICm1) test_data = comp_Model(n = 100, p = p, beta = Comp_data$beta, intercept = FALSE) y_hat = predict(GICm1, Znew = test_data$X.comp, Zcnew = test_data$Zc) plot(test_data$y, y_hat, xlab = "Observed value", ylab = "Predicted value") abline(a = 0, b = 1, col = "red")