GIC.FuncompCGL: Compute information crieteria for the 'FuncompCGL' model.

Description Usage Arguments Details Value References See Also Examples

View source: R/GIC.R

Description

Tune the grid values of the penalty parameter codelam and the degrees of freedom of the basis function k in the FuncompCGL model by GIC, BIC, or AIC. This function calculates the GIC, BIC, or AIC curve and returns the optimal values of lam and k.

Usage

1
2
3
4
GIC.FuncompCGL(y, X, Zc = NULL, lam = NULL, nlam = 100, k = 4:10, ref = NULL,
              intercept = TRUE, W = rep(1,times = p - length(ref)),
              type = c("GIC", "BIC", "AIC"),
              mu_ratio = 1.01, outer_maxiter = 1e+6, ...)

Arguments

y

response vector with length n.

X

data frame or matrix.

  • If nrow(X) > n, X should be a data frame or matrix of the functional compositional predictors with p columns for the values of the composition components, a column indicating subject ID and a column of observation times. Order of Subject ID should be the SAME as that of y. Zero entry is not allowed.

  • If nrow(X)[1]=n, X is considered as after taken integration, a n*(k*p - length(ref)) matrix.

Zc

a n*p_c design matrix of unpenalized variables. Default is NULL.

lam

a user supplied lambda sequence. If lam is provided as a scaler and nlam>1, lam sequence is created starting from lam. To run a single value of lam, set nlam=1. The program will sort user-defined lambda sequence in decreasing order.

nlam

the length of the lam sequence. Default is 100. No effect if lam is provided.

k

an integer vector specifying the degrees of freedom of the basis function.

ref

reference level (baseline), either an integer between [1,p] or NULL. Default value is NULL.

  • If ref is set to be an integer between [1,p], the group lasso penalized log-contrast model (with log-ratios) is fitted with the ref-th component chosed as baseline.

  • If ref is set to be NULL, the linearly constrained group lasso penalized log-contrast model is fitted.

intercept

Boolean, specifying whether to include an intercept. Default is TRUE.

W

a vector of length p (the total number of groups), or a matrix with dimension p1*p1, where p1=(p - length(ref)) * k, or character specifying the function used to calculate weight matrix for each group.

  • a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage.

  • a diagonal matrix with positive diagonal elements.

  • if character string of function name or an object of type function to compute the weights.

type

a character string specifying which crieterion to use. The choices include "GIC" (default), "BIC", and "AIC".

mu_ratio

the increasing ratio of the penalty parameter u. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If mu_ratio < 1, the program automatically set the initial penalty parameter u as 0 and outer_maxiter as 1, indicating that there is no linear constraint.

outer_maxiter

maximum number of loops allowed for the augmented Lanrange method.

...

other arguments that could be passed to FuncompCL.

Details

The FuncompCGL model estimation is conducted through minimizing the linearly constrained group lasso criterion

\frac{1}{2n}\|y - 1_nβ_0 - Z_cβ_c - Zβ\|_2^2 + λ ∑_{j=1}^{p} \|β_j\|_2, s.t. ∑_{j=1}^{p} β_j = 0_k.

The tuning parameters can be selected by the generalized information crieterion (GIC),

GIC(λ,k) = \log{(\hat{σ}^2(λ,k))} + (s(λ, k) - 1)k \log{(max(p*k+p_c+1, n))} \log{(\log{n})}/n ,

where \hat{σ}^2(λ,k) = \|y - 1_n\hat{β_0}(λ, k) - Z_c\hat{β_c}(λ, k) - Z\hat{β}(λ, k) \|_{2}^{2}/n with \hat{β_0}(λ, k), \hat{β_c}(λ, k) and \hat{β}(λ, k) being the regularized estimators of the regression coefficients, and s(λ, k) is the number of nonzero coefficient groups in \hat{β}(λ, k).

Value

An object of S3 class "GIC.FuncompCGL" is returned, which is a list containing:

FuncompCGL.fit

a list of length length(k), with fitted FuncompCGL objects of different degrees of freedom of the basis function.

lam

the sequence of the penalty parameter lam.

GIC

a k by length(lam) matirx of GIC values.

lam.min

the optimal values of the degrees of freedom k and the penalty parameter lam.

MSE

a k by length(lam) matirx of mean squared errors.

References

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.

Fan, Y., and Tang, C. Y. (2013) Tuning parameter selection in high dimensional penalized likelihood, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12001 Journal of the Royal Statistical Society. Series B 75 531-552.

See Also

FuncompCGL and cv.FuncompCGL, and predict, coef and plot methods for "GIC.FuncompCGL" object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
n_train = 50
n_test = 30
k_list <- c(4,5)
Data <- Fcomp_Model(n = n_train, p = p, m = 0, intercept = TRUE,
                    SNR = 4, sigma = 3, rho_X = 0.2, rho_T = 0.5,
                    df_beta = df_beta, n_T = 20, obs_spar = 1, theta.add = FALSE,
                    beta_C = as.vector(t(beta_C_true)))
arg_list <- as.list(Data$call)[-1]
arg_list$n <- n_test
Test <- do.call(Fcomp_Model, arg_list)

## GIC_cgl: Constrained group lasso
GIC_cgl <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                          Zc = Data$data$Zc, intercept = Data$data$intercept,
                          k = k_list)
coef(GIC_cgl)
plot(GIC_cgl)
y_hat <- predict(GIC_cgl, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_naive: ignoring the zero-sum constraints
## set mu_raio = 0 to identifying without linear constraints,
## no outer_loop for Lagrange augmented multiplier
GIC_naive <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, mu_ratio = 0)
coef(GIC_naive)
plot(GIC_naive)
y_hat <- predict(GIC_naive, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_base: random select a component as reference
## mu_ratio is set to 0 automatically once ref is set to a integer
ref <- sample(1:p, 1)
GIC_base <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, ref = ref)
coef(GIC_base)
plot(GIC_base)
y_hat <- predict(GIC_base, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

Compack documentation built on July 1, 2020, 10:26 p.m.