# GIC.FuncompCGL: Compute information crieteria for the 'FuncompCGL' model. In Compack: Regression with Compositional Covariates

## Description

Tune the grid values of the penalty parameter codelam and the degrees of freedom of the basis function k in the FuncompCGL model by GIC, BIC, or AIC. This function calculates the GIC, BIC, or AIC curve and returns the optimal values of lam and k.

## Usage

 1 2 3 4 GIC.FuncompCGL(y, X, Zc = NULL, lam = NULL, nlam = 100, k = 4:10, ref = NULL, intercept = TRUE, W = rep(1,times = p - length(ref)), type = c("GIC", "BIC", "AIC"), mu_ratio = 1.01, outer_maxiter = 1e+6, ...) 

## Arguments

 y response vector with length n. X data frame or matrix. If nrow(X) > n, X should be a data frame or matrix of the functional compositional predictors with p columns for the values of the composition components, a column indicating subject ID and a column of observation times. Order of Subject ID should be the SAME as that of y. Zero entry is not allowed. If nrow(X)[1]=n, X is considered as after taken integration, a n*(k*p - length(ref)) matrix. Zc a n*p_c design matrix of unpenalized variables. Default is NULL. lam a user supplied lambda sequence. If lam is provided as a scaler and nlam>1, lam sequence is created starting from lam. To run a single value of lam, set nlam=1. The program will sort user-defined lambda sequence in decreasing order. nlam the length of the lam sequence. Default is 100. No effect if lam is provided. k an integer vector specifying the degrees of freedom of the basis function. ref reference level (baseline), either an integer between [1,p] or NULL. Default value is NULL. If ref is set to be an integer between [1,p], the group lasso penalized log-contrast model (with log-ratios) is fitted with the ref-th component chosed as baseline. If ref is set to be NULL, the linearly constrained group lasso penalized log-contrast model is fitted. intercept Boolean, specifying whether to include an intercept. Default is TRUE. W a vector of length p (the total number of groups), or a matrix with dimension p1*p1, where p1=(p - length(ref)) * k, or character specifying the function used to calculate weight matrix for each group. a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage. a diagonal matrix with positive diagonal elements. if character string of function name or an object of type function to compute the weights. type a character string specifying which crieterion to use. The choices include "GIC" (default), "BIC", and "AIC". mu_ratio the increasing ratio of the penalty parameter u. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If mu_ratio < 1, the program automatically set the initial penalty parameter u as 0 and outer_maxiter as 1, indicating that there is no linear constraint. outer_maxiter maximum number of loops allowed for the augmented Lanrange method. ... other arguments that could be passed to FuncompCL.

## Details

The FuncompCGL model estimation is conducted through minimizing the linearly constrained group lasso criterion

\frac{1}{2n}\|y - 1_nβ_0 - Z_cβ_c - Zβ\|_2^2 + λ ∑_{j=1}^{p} \|β_j\|_2, s.t. ∑_{j=1}^{p} β_j = 0_k.

The tuning parameters can be selected by the generalized information crieterion (GIC),

GIC(λ,k) = \log{(\hat{σ}^2(λ,k))} + (s(λ, k) - 1)k \log{(max(p*k+p_c+1, n))} \log{(\log{n})}/n ,

where \hat{σ}^2(λ,k) = \|y - 1_n\hat{β_0}(λ, k) - Z_c\hat{β_c}(λ, k) - Z\hat{β}(λ, k) \|_{2}^{2}/n with \hat{β_0}(λ, k), \hat{β_c}(λ, k) and \hat{β}(λ, k) being the regularized estimators of the regression coefficients, and s(λ, k) is the number of nonzero coefficient groups in \hat{β}(λ, k).

## Value

An object of S3 class "GIC.FuncompCGL" is returned, which is a list containing:

 FuncompCGL.fit a list of length length(k), with fitted FuncompCGL objects of different degrees of freedom of the basis function. lam the sequence of the penalty parameter lam. GIC a k by length(lam) matirx of GIC values. lam.min the optimal values of the degrees of freedom k and the penalty parameter lam. MSE a k by length(lam) matirx of mean squared errors.

## References

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.

Fan, Y., and Tang, C. Y. (2013) Tuning parameter selection in high dimensional penalized likelihood, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12001 Journal of the Royal Statistical Society. Series B 75 531-552.

FuncompCGL and cv.FuncompCGL, and predict, coef and plot methods for "GIC.FuncompCGL" object.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 df_beta = 5 p = 30 beta_C_true = matrix(0, nrow = p, ncol = df_beta) beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1) beta_C_true[2, ] <- c(0.8, 0.8, 0.7, 0.6, 0.6) beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1) beta_C_true[4, ] <- c(0.5, 0.5, -0.6 ,-0.6, -0.6) n_train = 50 n_test = 30 k_list <- c(4,5) Data <- Fcomp_Model(n = n_train, p = p, m = 0, intercept = TRUE, SNR = 4, sigma = 3, rho_X = 0.2, rho_T = 0.5, df_beta = df_beta, n_T = 20, obs_spar = 1, theta.add = FALSE, beta_C = as.vector(t(beta_C_true))) arg_list <- as.list(Data$call)[-1] arg_list$n <- n_test Test <- do.call(Fcomp_Model, arg_list) ## GIC_cgl: Constrained group lasso GIC_cgl <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp, Zc = Data$data$Zc, intercept = Data$data$intercept, k = k_list) coef(GIC_cgl) plot(GIC_cgl) y_hat <- predict(GIC_cgl, Znew = Test$data$Comp, Zcnew = Test$data$Zc) plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response") ## GIC_naive: ignoring the zero-sum constraints ## set mu_raio = 0 to identifying without linear constraints, ## no outer_loop for Lagrange augmented multiplier GIC_naive <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp, Zc = Data$data$Zc, intercept = Data$data$intercept, k = k_list, mu_ratio = 0) coef(GIC_naive) plot(GIC_naive) y_hat <- predict(GIC_naive, Znew = Test$data$Comp, Zcnew = Test$data$Zc) plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response") ## GIC_base: random select a component as reference ## mu_ratio is set to 0 automatically once ref is set to a integer ref <- sample(1:p, 1) GIC_base <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp, Zc = Data$data$Zc, intercept = Data$data$intercept, k = k_list, ref = ref) coef(GIC_base) plot(GIC_base) y_hat <- predict(GIC_base, Znew = Test$data$Comp, Zcnew = Test$data$Zc) plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")