FuncompCGL: Fits regularization paths for longitudinal compositional data...

Description Usage Arguments Value Examples

Description

Fits regularization paths for longitudinal compositional data with group-lasso penalty at a sequence of regularization parameters lambda and fixed degree of freedom of basis.

Usage

1
2
3
4
5
6
7
8
FuncompCGL(y, X, Zc = NULL, intercept = TRUE, ref = NULL, k, degree = 3,
  basis_fun = c("bs", "OBasis", "fourier"), insert = c("FALSE", "X",
  "basis"), method = c("trapezoidal", "step"), interval = c("Original",
  "Standard"), Trange, T.name = "TIME", ID.name = "Subject_ID", W = rep(1,
  times = p - length(ref)), dfmax = p - length(ref), pfmax = min(dfmax *
  1.5, p - length(ref)), lam = NULL, nlam = 100, lambda.factor = ifelse(n
  < p1, 0.05, 0.001), tol = 0, mu_ratio = 1.01, outer_maxiter = 1e+08,
  outer_eps = 1e-08, inner_maxiter = 10000, inner_eps = 1e-08)

Arguments

y

a vector of response variable.

X

a data frame or matrix.

  • If dim(X)[1] > n, n is the sample size, X should be a data frame of longitudinal compositinal predictors with number p, including subject ID and time variable. Order of subject ID should be the same as that of y.

  • If dim(X)[1]=n, X is considered as after taken integration, a n*(p*k) matrix.

Zc

A design matrix for control variables, could be missing. Default is NULL. No penalty is imposed.

intercept

whether to include intercept. Default is TRUE.

ref

reference variable. If ref is set to a scalar between [1,p], log-contract method is applied with the variable ref as baseline. If ref = NULL (default value), constrained group lasso method is applied

k

a scaler, degree of freedom of basis.

degree

degree of basis - default value is 3.

basis_fun

a function of basis. For now one of the following three types,

  • bs B-splines see bs.

  • OBasis Orthoganal B-splies, see orthogonalsplinebasis.

  • fourier Fourier basis, see fda

Default is "bs".

insert

way to interpolation. If insert = "X" or "basis", dense time sequence is generated, equally space by min(diff(sseq))/20), where sseq is sorted set of all observed time points.

  • "FALSE" no interpolation.

  • "X" linear interpolation of compositional data.

  • "basis" compositional data is considered as step function, imposing basis on un-observed time points for each subject.

Default is "FALSE"

method

method used to approximate integral.

  • "trapezoidal" Sum up area under trapezoidal formulated by values of function at two adjacent observed time points. See ITG_trap.

  • "step" Sum up area under rectangle formulated by step function at observed time points. See ITG_step.

Default is "trapezoidal"

interval

a character string sepcifying domain of integral

  • "Original" On original time scale, interval = range(Time).

  • "Standard" Time points are mapped onto [0,1], interval = (0,1).

Default is "Original"

Trange

range of time points

T.name, ID.name

characters specifying names of time varaible and Subject ID respectively in X, only needed as X is data frame of longitudinal compositinal varaibles. Default are "TIME" and "Subject_ID".

W

a vector in length of p (the total number of groups), matrix with dimension p1*p1 or character specifying function used to calculate inverted weight matrix for each group.

  • If vector, works as penalty factor. Separate penalty weights can be applied to each group of beta'ss. to allow differential shrinkage. Can be 0 for some groups, which implies no shrinkage, and results in that group always being included in the model.

  • If matrix, a block diagonal matrix. Diagonal elements are inverted weights matrics for each group.

  • if character, user should provide the function for inverted weights matrics.

Default value is rep(1, times = p).

dfmax

limit the maximum number of groups in the model. Useful for very large p, if a partial path is desired - default is p.

pfmax

limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path, no matter how many times it exits or re-enters model through the path, it will be counted only once. Default is min(dfmax*1.5, p).

lam

a user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own lam sequence based on nlam and lambda.factor If lam is provided but a scaler, lam sequence is also created starting from lam. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values, if not, the program will sort user-defined lambda sequence in decreasing order automatically.

nlam

the length of lam sequence - default is 100.

lambda.factor

the factor for getting the minimal lambda in lam sequence, where min(lam) = lambda.factor * max(lam). max(lam) is the smallest value of lam for which all penalized group are zero's. The default depends on the relationship between n and p1 If n >= p1 the default is 0.001, close to zero. If n < p1, the default is 0.05. A very small value of lambda.factor will lead to a saturated fit. It takes no effect if there is user-defined lambda sequence.

tol

tolerance for vectors beta'ss to be considered as none zero's. For example, coefficient β_j for group j, if max(abs(β_j)) < tol, set β_j as 0's. Default value is 0.

mu_ratio

mu_ratio is the increasing ratio for u - default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If mu_ratio < 1, there is no linear constraints included. Group lasso coefficients are estimated.

outer_maxiter

outer_maxiter is the maximun munber of loops allowed for Augmented Lanrange method; and outer_eps is the convergence termination tolerance.

outer_eps

outer_maxiter is the maximun munber of loops allowed for Augmented Lanrange method; and outer_eps is the convergence termination tolerance.

inner_maxiter

inner_maxiter is the maximun munber of loops allowed for blockwise-GMD; and inner_eps is the convergence termination tolerance.

inner_eps

inner_maxiter is the maximun munber of loops allowed for blockwise-GMD; and inner_eps is the convergence termination tolerance.

Value

An object with S3 calss FuncompCGL

Z

integral matrix for longitudinal compositinal predictors with dimension n*(p*k).

lam

the actual sequence of lam values used.

df

the number of non-zero groups in estimated coefficients for Z at each value of lam

beta

a matrix of coefficients for cbind{Z, Zc, 1_n}, with nlam rows.

dim

dimension of coefficient matrix

call

the call that produced this object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
Data <- Model(n = 50, p = p, m = 2, intercept = TRUE,
              SNR = 2, sigma = 2,
              rho_X = 0, rho_W = 0,
              df_W = 5, df_beta = df_beta,
              ns = 100, obs_spar = 0.2, theta.add = c(3,4,5),
              beta_C = as.vector(t(beta_C_true)))
y <- Data$data$y
X <- Data$data$Comp
Zc <- Data$data$Zc
intercept <- Data$data$intercept

k_use <- df_beta
m1 <- FuncompCGL(y = y, X = X , Zc = Zc, intercept = intercept,
                 k = k_use, basis_fun = "bs",
                 insert = "FALSE", method = "t",
                 dfmax = p, tol = 1e-6)

beta <- coef(m1, s = m1$lam[20])
#beta <- coef(m1)

beta_C <- matrix(beta[1:(p*k_use)], nrow = p, byrow = TRUE)
colSums(beta_C)
Non.zero <- apply(beta_C, 1, function(x) ifelse(max(abs(x)) == 0, FALSE, TRUE))
Non.zero <- (1:p)[Non.zero]
Non.zero
plot(m1, ylab = "L2", p = p , k = k_use)

Zhe-Research/compReg documentation built on May 28, 2019, 8:38 a.m.