FuncompCGL: Fits regularization paths for longitudinal compositional data...
In Zhe-Research/compReg: Compositional Data Regression

Description Usage Arguments Value Examples

Fits regularization paths for longitudinal compositional data with group-lasso penalty at a sequence of regularization parameters lambda and fixed degree of freedom of basis.

FuncompCGL(y, X, Zc = NULL, intercept = TRUE, ref = NULL, k, degree = 3,
  basis_fun = c("bs", "OBasis", "fourier"), insert = c("FALSE", "X",
  "basis"), method = c("trapezoidal", "step"), interval = c("Original",
  "Standard"), Trange, T.name = "TIME", ID.name = "Subject_ID", W = rep(1,
  times = p - length(ref)), dfmax = p - length(ref), pfmax = min(dfmax *
  1.5, p - length(ref)), lam = NULL, nlam = 100, lambda.factor = ifelse(n
  < p1, 0.05, 0.001), tol = 0, mu_ratio = 1.01, outer_maxiter = 1e+08,
  outer_eps = 1e-08, inner_maxiter = 10000, inner_eps = 1e-08)

`y`	a vector of response variable.
`X`	a data frame or matrix. If `dim(X)[1]` > n, n is the sample size, `X` should be a data frame of longitudinal compositinal predictors with number p, including subject ID and time variable. Order of subject ID should be the same as that of `y`. If `dim(X)[1]`=n, `X` is considered as after taken integration, a n(pk) matrix.
`Zc`	A design matrix for control variables, could be missing. Default is NULL. No penalty is imposed.
`intercept`	whether to include intercept. Default is TRUE.
`ref`	reference variable. If `ref` is set to a scalar between `[1,p]`, log-contract method is applied with the variable `ref` as baseline. If `ref` = `NULL` (default value), constrained group lasso method is applied
`k`	a scaler, degree of freedom of basis.
`degree`	degree of basis - default value is 3.
`basis_fun`	a function of basis. For now one of the following three types, `bs` B-splines see `bs`. `OBasis` Orthoganal B-splies, see `orthogonalsplinebasis`. `fourier` Fourier basis, see `fda` Default is `"bs"`.
`insert`	way to interpolation. If `insert` = `"X"` or `"basis"`, dense time sequence is generated, equally space by `min(diff(sseq))/20)`, where `sseq` is sorted set of all observed time points. `"FALSE"` no interpolation. `"X"` linear interpolation of compositional data. `"basis"` compositional data is considered as step function, imposing basis on un-observed time points for each subject. Default is `"FALSE"`
`method`	method used to approximate integral. `"trapezoidal"` Sum up area under trapezoidal formulated by values of function at two adjacent observed time points. See `ITG_trap`. `"step"` Sum up area under rectangle formulated by step function at observed time points. See `ITG_step`. Default is `"trapezoidal"`
`interval`	a character string sepcifying domain of integral "Original" On original time scale, interval = range(Time). "Standard" Time points are mapped onto [0,1], interval = (0,1). Default is `"Original"`
`Trange`	range of time points
`T.name, ID.name`	characters specifying names of time varaible and Subject ID respectively in X, only needed as X is data frame of longitudinal compositinal varaibles. Default are `"TIME"` and `"Subject_ID"`.
`W`	a vector in length of p (the total number of groups), matrix with dimension `p1*p1` or character specifying function used to calculate inverted weight matrix for each group. If vector, works as penalty factor. Separate penalty weights can be applied to each group of beta'ss. to allow differential shrinkage. Can be 0 for some groups, which implies no shrinkage, and results in that group always being included in the model. If matrix, a block diagonal matrix. Diagonal elements are inverted weights matrics for each group. if character, user should provide the function for inverted weights matrics. Default value is rep(1, times = p).
`dfmax`	limit the maximum number of groups in the model. Useful for very large p, if a partial path is desired - default is p.
`pfmax`	limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path, no matter how many times it exits or re-enters model through the path, it will be counted only once. Default is `min(dfmax*1.5, p)`.
`lam`	a user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own `lam` sequence based on `nlam` and `lambda.factor` If `lam` is provided but a scaler, `lam` sequence is also created starting from `lam`. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values, if not, the program will sort user-defined `lambda` sequence in decreasing order automatically.
`nlam`	the length of `lam` sequence - default is 100.
`lambda.factor`	the factor for getting the minimal lambda in `lam` sequence, where `min(lam)` = `lambda.factor` * `max(lam)`. `max(lam)` is the smallest value of `lam` for which all penalized group are zero's. The default depends on the relationship between n and p1 If n >= p1 the default is `0.001`, close to zero. If n < p1, the default is `0.05`. A very small value of `lambda.factor` will lead to a saturated fit. It takes no effect if there is user-defined lambda sequence.
`tol`	tolerance for vectors beta'ss to be considered as none zero's. For example, coefficient β_j for group j, if max(abs(β_j)) < `tol`, set β_j as 0's. Default value is 0.
`mu_ratio`	`mu_ratio` is the increasing ratio for `u` - default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If `mu_ratio` < 1, there is no linear constraints included. Group lasso coefficients are estimated.
`outer_maxiter`	`outer_maxiter` is the maximun munber of loops allowed for Augmented Lanrange method; and `outer_eps` is the convergence termination tolerance.
`outer_eps`	`outer_maxiter` is the maximun munber of loops allowed for Augmented Lanrange method; and `outer_eps` is the convergence termination tolerance.
`inner_maxiter`	`inner_maxiter` is the maximun munber of loops allowed for blockwise-GMD; and `inner_eps` is the convergence termination tolerance.
`inner_eps`	`inner_maxiter` is the maximun munber of loops allowed for blockwise-GMD; and `inner_eps` is the convergence termination tolerance.

An object with S3 calss FuncompCGL

`Z`	integral matrix for longitudinal compositinal predictors with dimension n(pk).
`lam`	the actual sequence of `lam` values used.
`df`	the number of non-zero groups in estimated coefficients for `Z` at each value of `lam`
`beta`	a matrix of coefficients for `cbind{Z, Zc, 1_n}`, with `nlam` rows.
`dim`	dimension of coefficient matrix
`call`	the call that produced this object.

df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
Data <- Model(n = 50, p = p, m = 2, intercept = TRUE,
              SNR = 2, sigma = 2,
              rho_X = 0, rho_W = 0,
              df_W = 5, df_beta = df_beta,
              ns = 100, obs_spar = 0.2, theta.add = c(3,4,5),
              beta_C = as.vector(t(beta_C_true)))
y <- Data$data$y
X <- Data$data$Comp
Zc <- Data$data$Zc
intercept <- Data$data$intercept

k_use <- df_beta
m1 <- FuncompCGL(y = y, X = X , Zc = Zc, intercept = intercept,
                 k = k_use, basis_fun = "bs",
                 insert = "FALSE", method = "t",
                 dfmax = p, tol = 1e-6)

beta <- coef(m1, s = m1$lam[20])
#beta <- coef(m1)

beta_C <- matrix(beta[1:(p*k_use)], nrow = p, byrow = TRUE)
colSums(beta_C)
Non.zero <- apply(beta_C, 1, function(x) ifelse(max(abs(x)) == 0, FALSE, TRUE))
Non.zero <- (1:p)[Non.zero]
Non.zero
plot(m1, ylab = "L2", p = p , k = k_use)