Fcomp_Model: Simulation for functional composition data.

simulate functional compositional data.


Fcomp_Model(n, p, m = 0, intercept = TRUE,
            interval = c(0, 1), n_T = 100, obs_spar = 0.6, discrete = FALSE,
            SNR = 1, sigma = 2, Nzero_group = 4,
            rho_X, Corr_X = c("CorrCS", "CorrAR"),
            rho_T, Corr_T = c("CorrAR", "CorrCS"),
            range_beta = c(0.5, 1), beta_c = 1, beta_C ,
            theta.add = c(1, 2, 5, 6), gamma = 0.5,
            basis_beta = c("bs", "OBasis", "fourier"), df_beta = 5, degree_beta = 3,
            insert = c("FALSE", "X", "basis"), method = c("trapezoidal", "step"))



sample size.


number of the components in the functional compositional data.


size of unpenalized variables. The first ceiling(m/2) ones are generated with independent bin(1,0.5) entries; while the last (m - ceiling(m/2)) ones are generated with independent norm(0, 1) entries. Default is 0.


whether to include an intercept. Default is TRUE.


a vector of length 2 indicating the time domain. Default is c(0, 1).


an integer specifying length of the equally spaced time sequence on domian interval.


a percentage used to get sparse ovbservation. Each time point is with probability obs_spar to be observed. It allows different subject to be observed on different time points. obs_spar * n_T > 5 is required.


logical (default is FALSE) specifying whether the functional compositional data X is generated at different time points. If distrete = TRUE, generate X on dense sequence created by max(ns_dense = 200 * diff(interval), 5 * n_T) and then for each subject, randomly sample n_T points.


signal to noise ratio.


variance used to generate the covariance matrix CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma). The "non-normalized" data w_i for each subject is genearted from multivariate normal distribution with covariance CovMIX. T.Sigma and X.Sigma are correlation matrices for time points and components, respectively.


an even integer specifying that the first Nzero_group compositional predictors are with non-zero effects. Default is 4.

rho_X, rho_T

parameters used to generate correlation matrices.

Corr_X, Corr_T

character string specifying correlation structure bewteen components and between time points, respectively.

  • "CorrCS"(Default for Corr_X) compound symmetry.

  • "CorrAR"(Default for Corr_T) autoregressive.


a sorted vector of length 2, specifying the range of coefficient matrix B of demension p*k. Specifically, each column of B is filled with Nzero_group/2 values from the unifom distribution over range_beta and their negative counterparts. Default is c(0.5, 1).


value of coefficients for beta0 and beta_c (coefficients for intercept and time-invariant predictors). Default is 1.


vectorized coefficient matrix. If missing, the program will generate beta_C according to range_beta and Nzero_group.


logical or integer(s).

  • If integer(s), a vector with value(s) in [1,p], indicating which component(s) of compostions is of high level mean curve.

  • If TRUE, the components c(1:ceiling(Nzero_group/2) and Nzero_group + (1:ceiling(Nzero_group/2))) are set to with high level mean.

  • if FALSE, all mean curves are set to 0's.


for the high-level mean groups, log(p * gamma) is added on the "non-normalized" data w_i before the data are converted to be compositional.

basis_beta, df_beta, degree_beta

basis_fun, k and degree in FuncompCGL respectively.


a character string sepcifying method to perform functional interpolation.

  • "FALSE"(Default) no interpolation.

  • "X" linear interpolation of functional compositional data along the time grid.

  • "basis" the functional compositional data is interplolated as a step function along the time grid.

If insert = "X" or "basis", interplolation is conducted on sseq, where sseq is the sorted sequence of all the observed time points.


a character string sepcifying method used to approximate integral.

  • "trapezoidal"(Default) Sum up areas under the trapezoids.

  • "step" Sum up area under the rectangles.


The setup of this simulation follows Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.
Specifically, we first generate correlation matrix X.sigma for components of a composition based on rho_X and Corr_X, and correlation matrix T.sigma for time points based on rho_T and Corr_T. Then, the "non-normalized" data w_i=[w_i(t_1)^T,...,w_i(t_{n_T})^T] for each subject are generated from multivariate normal distrubtion with covariance CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma), and the mean vector is determined by theta.add and gamma. Each w_i(t_v) is a p-vector for each time point v =1,...,T_n. Finally, the compositional data are obtained as

x_{ij}(t_v) = exp(w_{ij}(t_v))/sum_{k=1}^{p} exp(w_{ik}(t_v)),

for each subject i=1,...,n, component of a composition j=1,...,p and time point v=1,...,n_T.


a list including


a list of observed data,

  • y a vector of response variable,

  • Comp a data frame of observed functional compositional data, a column of Subject_ID, and a column of TIME,

  • Zc a matrix of unpenalized variables with dimension n*m,

  • intercept whether an intercept is included.


a length p*df_beta + m + 1 vector of coefficients


matrix of the basis function to generate the coefficient curves


a list consisting of

  • Z_t.full the functional compositional data.

  • Z_ITG integrated functional compositional data.

  • Y.tru true response vector without noise.

  • X functional "non-normalized" data W.


a list of parameters used in the simulation.


Zhe Sun and Kun Chen


Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics


Data <- Fcomp_Model(n = 50, p = 30, m = 0, intercept = TRUE, Nzero_group = 4,
                    n_T = 20, SNR = 3, rho_X = 0, rho_T = 0.6,
                    df_beta = 5, obs_spar = 1, theta.add = FALSE)

