GIC.FuncompCGL: Compute information crieteria for the 'FuncompCGL' model.
In jiji6454/compReg: Regression with Compositional Covariates

Description Usage Arguments Details Value References See Also Examples

Tune the grid values of the penalty parameter codelam and the degrees of freedom of the basis function k in the FuncompCGL model by GIC, BIC, or AIC. This function calculates the GIC, BIC, or AIC curve and returns the optimal values of lam and k.

GIC.FuncompCGL(y, X, Zc = NULL, lam = NULL, nlam = 100, k = 4:10, ref = NULL,
              intercept = TRUE, W = rep(1,times = p - length(ref)),
              type = c("GIC", "BIC", "AIC"),
              mu_ratio = 1.01, outer_maxiter = 1e+6, ...)

`y`	response vector with length n.
`X`	data frame or matrix. If `nrow(X)` > n, `X` should be a data frame or matrix of the functional compositional predictors with p columns for the values of the composition components, a column indicating subject ID and a column of observation times. Order of Subject ID should be the SAME as that of `y`. Zero entry is not allowed. If `nrow(X)[1]`=n, `X` is considered as after taken integration, a `n(kp - length(ref))` matrix.
`Zc`	a np_c* design matrix of unpenalized variables. Default is NULL.
`lam`	a user supplied lambda sequence. If `lam` is provided as a scaler and `nlam`>1, `lam` sequence is created starting from `lam`. To run a single value of `lam`, set `nlam`=1. The program will sort user-defined `lambda` sequence in decreasing order.
`nlam`	the length of the `lam` sequence. Default is 100. No effect if `lam` is provided.
`k`	an integer vector specifying the degrees of freedom of the basis function.
`ref`	reference level (baseline), either an integer between [1,p] or `NULL`. Default value is `NULL`. If `ref` is set to be an integer between `[1,p]`, the group lasso penalized log-contrast model (with log-ratios) is fitted with the `ref`-th component chosed as baseline. If `ref` is set to be `NULL`, the linearly constrained group lasso penalized log-contrast model is fitted.
`intercept`	Boolean, specifying whether to include an intercept. Default is TRUE.
`W`	a vector of length p (the total number of groups), or a matrix with dimension p1p1, where `p1=(p - length(ref)) k`, or character specifying the function used to calculate weight matrix for each group. a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage. a diagonal matrix with positive diagonal elements. if character string of function name or an object of type `function` to compute the weights.
`type`	a character string specifying which crieterion to use. The choices include `"GIC"` (default), `"BIC"`, and `"AIC"`.
`mu_ratio`	the increasing ratio of the penalty parameter `u`. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If `mu_ratio` < 1, the program automatically set the initial penalty parameter `u` as 0 and `outer_maxiter` as 1, indicating that there is no linear constraint.
`outer_maxiter`	maximum number of loops allowed for the augmented Lanrange method.
`...`	other arguments that could be passed to FuncompCL.

The FuncompCGL model estimation is conducted through minimizing the linearly constrained group lasso criterion

\frac{1}{2n}\|y - 1_nβ_0 - Z_cβ_c - Zβ\|_2^2 + λ ∑_{j=1}^{p} \|β_j\|_2, s.t. ∑_{j=1}^{p} β_j = 0_k.

The tuning parameters can be selected by the generalized information crieterion (GIC),

GIC(λ,k) = \log{(\hat{σ}^2(λ,k))} + (s(λ, k) - 1)k \log{(max(p*k+p_c+1, n))} \log{(\log{n})}/n ,

where \hat{σ}^2(λ,k) = \|y - 1_n\hat{β_0}(λ, k) - Z_c\hat{β_c}(λ, k) - Z\hat{β}(λ, k) \|_{2}^{2}/n with \hat{β_0}(λ, k), \hat{β_c}(λ, k) and \hat{β}(λ, k) being the regularized estimators of the regression coefficients, and s(λ, k) is the number of nonzero coefficient groups in \hat{β}(λ, k).

An object of S3 class "GIC.FuncompCGL" is returned, which is a list containing:

`FuncompCGL.fit`	a list of length `length(k)`, with fitted `FuncompCGL` objects of different degrees of freedom of the basis function.
`lam`	the sequence of the penalty parameter `lam`.
`GIC`	a `k` by `length(lam)` matirx of GIC values.
`lam.min`	the optimal values of the degrees of freedom `k` and the penalty parameter `lam`.
`MSE`	a `k` by `length(lam)` matirx of mean squared errors.

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.

Fan, Y., and Tang, C. Y. (2013) Tuning parameter selection in high dimensional penalized likelihood, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12001 Journal of the Royal Statistical Society. Series B 75 531-552.

FuncompCGL and cv.FuncompCGL, and predict, coef and plot methods for "GIC.FuncompCGL" object.

df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
n_train = 50
n_test = 30
k_list <- c(4,5)
Data <- Fcomp_Model(n = n_train, p = p, m = 0, intercept = TRUE,
                    SNR = 4, sigma = 3, rho_X = 0.2, rho_T = 0.5,
                    df_beta = df_beta, n_T = 20, obs_spar = 1, theta.add = FALSE,
                    beta_C = as.vector(t(beta_C_true)))
arg_list <- as.list(Data$call)[-1]
arg_list$n <- n_test
Test <- do.call(Fcomp_Model, arg_list)

## GIC_cgl: Constrained group lasso
GIC_cgl <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                          Zc = Data$data$Zc, intercept = Data$data$intercept,
                          k = k_list)
coef(GIC_cgl)
plot(GIC_cgl)
y_hat <- predict(GIC_cgl, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_naive: ignoring the zero-sum constraints
## set mu_raio = 0 to identifying without linear constraints,
## no outer_loop for Lagrange augmented multiplier
GIC_naive <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, mu_ratio = 0)
coef(GIC_naive)
plot(GIC_naive)
y_hat <- predict(GIC_naive, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_base: random select a component as reference
## mu_ratio is set to 0 automatically once ref is set to a integer
ref <- sample(1:p, 1)
GIC_base <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, ref = ref)
coef(GIC_base)
plot(GIC_base)
y_hat <- predict(GIC_base, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")