FuncompCGL: Fit regularization paths of sparse log-contrast regression...
In Compack: Regression with Compositional Covariates

Description Usage Arguments Details Value Author(s) References See Also Examples

Fit the penalized log-contrast regression with functional compositional predictors proposed by Zhe et al. (2020) <arXiv:1808.02403>. The model estimation is conducted by minimizing a linearly constrained group lasso criterion. The regularization paths are computed for the group lasso penalty at grid values of the regularization parameter lam and the degree of freedom of the basis function K.

FuncompCGL(y, X, Zc = NULL, intercept = TRUE, ref = NULL,
           k, degree = 3, basis_fun = c("bs", "OBasis", "fourier"),
           insert = c("FALSE", "X", "basis"), method = c("trapezoidal", "step"),
           interval = c("Original", "Standard"), Trange,
           T.name = "TIME", ID.name = "Subject_ID",
           W = rep(1,times = p - length(ref)),
           dfmax = p - length(ref), pfmax = min(dfmax * 1.5, p - length(ref)),
           lam = NULL, nlam = 100, lambda.factor = ifelse(n < p1, 0.05, 0.001),
           tol = 1e-8, mu_ratio = 1.01,
           outer_maxiter = 1e+6, outer_eps = 1e-8,
           inner_maxiter = 1e+4, inner_eps = 1e-8)

`y`	response vector with length n.
`X`	data frame or matrix. If `nrow(X)` > n, `X` should be a data frame or matrix of the functional compositional predictors with p columns for the values of the composition components, a column indicating subject ID and a column of observation times. Order of Subject ID should be the SAME as that of `y`. Zero entry is not allowed. If `nrow(X)[1]`=n, `X` is considered as after taken integration, a `n(kp - length(ref))` matrix.
`Zc`	a np_c* design matrix of unpenalized variables. Default is NULL.
`intercept`	Boolean, specifying whether to include an intercept. Default is TRUE.
`ref`	reference level (baseline), either an integer between [1,p] or `NULL`. Default value is `NULL`. If `ref` is set to be an integer between `[1,p]`, the group lasso penalized log-contrast model (with log-ratios) is fitted with the `ref`-th component chosed as baseline. If `ref` is set to be `NULL`, the linearly constrained group lasso penalized log-contrast model is fitted.
`k`	an integer, degrees of freedom of the basis function.
`degree`	degrees of freedom of the basis function. Default value is 3.
`basis_fun`	method to generate basis: `"bs"`(Default) B-splines. See fucntion `bs`. `"OBasis"` Orthoganal B-splines. See function `OBasis` and package orthogonalsplinebasis. `"fourier"` Fourier basis. See fucntion `create.fourier.basis` and package fda.
`insert`	a character string sepcifying method to perform functional interpolation. `"FALSE"`(Default) no interpolation. `"X"` linear interpolation of functional compositional data along the time grid. `"basis"` the functional compositional data is interplolated as a step function along the time grid. If `insert` = `"X"` or `"basis"`, interplolation is conducted on `sseq`, where `sseq` is the sorted sequence of all the observed time points.
`method`	a character string sepcifying method used to approximate integral. `"trapezoidal"`(Default) Sum up areas under the trapezoids. `"step"` Sum up area under the rectangles.
`interval`	a character string sepcifying the domain of the integral. `"Original"`(Default) On the original time scale, `interval` = `range(Time)`. `"Standard"` Time points are mapped onto [0,1], `interval` = `c(0,1)`.
`Trange`	range of time points
`T.name, ID.name`	a character string specifying names of the time variable and the Subject ID variable in `X`. This is only needed when X is a data frame or matrix of the functional compositional predictors. Default are `"TIME"` and `"Subject_ID"`.
`W`	a vector of length p (the total number of groups), or a matrix with dimension p1p1, where `p1=(p - length(ref)) k`, or character specifying the function used to calculate weight matrix for each group. a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage. a diagonal matrix with positive diagonal elements. if character string of function name or an object of type `function` to compute the weights.
`dfmax`	limit the maximum number of groups in the model. Useful for handling very large p, if a partial path is desired. Default is p.
`pfmax`	limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path, no matter how many times it re-enters the model through the path, it will be counted only once. Default is `min(dfmax*1.5, p)`.
`lam`	a user supplied lambda sequence. If `lam` is provided as a scaler and `nlam`>1, `lam` sequence is created starting from `lam`. To run a single value of `lam`, set `nlam`=1. The program will sort user-defined `lambda` sequence in decreasing order.
`nlam`	the length of the `lam` sequence. Default is 100. No effect if `lam` is provided.
`lambda.factor`	the factor for getting the minimal lambda in `lam` sequence, where `min(lam)` = `lambda.factor` * `max(lam)`. `max(lam)` is the smallest value of `lam` for which all penalized group are 0's. If n >= p1, the default is `0.001`. If n < p1, the default is `0.05`.
`tol`	tolerance for coefficient to be considered as non-zero. Once the convergence criterion is satisfied, for each element β_j in coefficient vector β, β_j = 0 if β_j < tol.
`mu_ratio`	the increasing ratio of the penalty parameter `u`. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If `mu_ratio` < 1, the program automatically set the initial penalty parameter `u` as 0 and `outer_maxiter` as 1, indicating that there is no linear constraint.
`outer_maxiter, outer_eps`	`outer_maxiter` is the maximum number of loops allowed for the augmented Lanrange method; and `outer_eps` is the corresponding convergence tolerance.
`inner_maxiter, inner_eps`	`inner_maxiter` is the maximum number of loops allowed for blockwise-GMD; and `inner_eps` is the corresponding convergence tolerance.

The functional log-contrast regression model for compositional predictors is defined as

y = 1_nβ_0 + Z_cβ_c + \int_T Z(t)β(t)dt + e, s.t. (1_p)^T β(t)=0 \forall t \in T,

where β_0 is the intercept, β_c is the regression coefficient vector with length p_c corresponding to the control variables, β(t) is the functional regression coefficient vector with length p as a funtion of t and e is the random error vector with zero mean with length n. Moreover, Z(t) is the log-transformed functional compostional data. If zero(s) exists in the original functional compositional data, user should pre-process these zero(s). For example, if count data provided, user could replace 0's with 0.5.
After adopting a truncated basis expansion approach to re-express β(t)

β(t) = B Φ(t),

where B is a p-by-k unkown but fixed coefficient matrix, and Φ(t) consists of basis with degree of freedom k. We could write functional log-contrast regression model as

y = 1_nβ_0 + Z_cβ_c + Zβ + e, s.t. ∑_{j=1}^{p}β_j=0_k,

where Z is a n-by-pk matrix corresponding to the integral, β=vec(B^T) is a pk-vector with every each k-subvector corresponding to the coefficient vector for the j-th compositional component.
To enable variable selection, FuncompCGL model is estimated via linearly constrained group lasso,

argmin_{β_0, β_c, β}(\frac{1}{2n}\|y - 1_nβ_0 - Z_cβ_c - Zβ\|_2^2 + λ ∑_{j=1}^{p} \|β_j\|_2), s.t. ∑_{j=1}^{p} β_j = 0_k.

An object with S3 class "FuncompCGL", which is a list containing:

`Z`	the integral matrix for the functional compositional predictors with dimension n(pk)*.
`lam`	the sequence of `lam` values.
`df`	the number of non-zero groups in the estimated coefficients for the functional compositional predictors at each value of `lam`.
`beta`	a matrix of coefficients with `length(lam)` columns and p_1+p_c+1 rows, where `p_1=pk`. The first p_1* rows are the estimated values for the coefficients for the functional compositional preditors, and the last row is for the intercept. If `intercept = FALSE`, the last row is 0's.
`dim`	dimension of the coefficient matrix.
`sseq`	sequence of the time points.
`call`	the call that produces this object.

Zhe Sun and Kun Chen

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.

Yang, Y. and Zou, H. (2015) A fast unified algorithm for computing group-lasso penalized learning problems, https://link.springer.com/article/10.1007/s11222-014-9498-5 Statistics and Computing 25(6) 1129-1141.

Aitchison, J. and Bacon-Shone, J. (1984) Log-contrast models for experiments with mixtures, Biometrika 71 323-330.

cv.FuncompCGL and GIC.FuncompCGL, and predict, coef, plot and print methods for "FuncompCGL" object.

df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
Data <- Fcomp_Model(n = 50, p = p, m = 0, intercept = TRUE,
                    SNR = 4, sigma = 3, rho_X = 0, rho_T = 0.6, df_beta = df_beta,
                    n_T = 20, obs_spar = 1, theta.add = FALSE,
                    beta_C = as.vector(t(beta_C_true)))
m1 <- FuncompCGL(y = Data$data$y, X = Data$data$Comp, Zc = Data$data$Zc,
                 intercept = Data$data$intercept, k = df_beta, tol = 1e-10)
print(m1)
plot(m1)
beta <- coef(m1)
arg_list <- as.list(Data$call)[-1]
arg_list$n <- 30
TEST <- do.call(Fcomp_Model, arg_list)
y_hat <- predict(m1, Znew = TEST$data$Comp, Zcnew = TEST$data$Zc)
plot(y_hat[, floor(length(m1$lam)/2)], TEST$data$y,
     ylab = "Observed Response", xlab = "Predicted Response")

beta <- coef(m1, s = m1$lam[20])
beta_C <- matrix(beta[1:(p*df_beta)], nrow = p, byrow = TRUE)
colSums(beta_C)
Non.zero <- (1:p)[apply(beta_C, 1, function(x) max(abs(x)) > 0)]
Non.zero