Fcomp_Model: Simulation for functional composition data.

Description Usage Arguments Details Value Author(s) References Examples

View source: R/simulation.R

Description

simulate functional compositional data.

Usage

1
2
3
4
5
6
7
8
9
Fcomp_Model(n, p, m = 0, intercept = TRUE,
            interval = c(0, 1), n_T = 100, obs_spar = 0.6, discrete = FALSE,
            SNR = 1, sigma = 2, Nzero_group = 4,
            rho_X, Corr_X = c("CorrCS", "CorrAR"),
            rho_T, Corr_T = c("CorrAR", "CorrCS"),
            range_beta = c(0.5, 1), beta_c = 1, beta_C ,
            theta.add = c(1, 2, 5, 6), gamma = 0.5,
            basis_beta = c("bs", "OBasis", "fourier"), df_beta = 5, degree_beta = 3,
            insert = c("FALSE", "X", "basis"), method = c("trapezoidal", "step"))

Arguments

n

sample size.

p

number of the components in the functional compositional data.

m

size of unpenalized variables. The first ceiling(m/2) ones are generated with independent bin(1,0.5) entries; while the last (m - ceiling(m/2)) ones are generated with independent norm(0, 1) entries. Default is 0.

intercept

whether to include an intercept. Default is TRUE.

interval

a vector of length 2 indicating the time domain. Default is c(0, 1).

n_T

an integer specifying length of the equally spaced time sequence on domian interval.

obs_spar

a percentage used to get sparse ovbservation. Each time point is with probability obs_spar to be observed. It allows different subject to be observed on different time points. obs_spar * n_T > 5 is required.

discrete

logical (default is FALSE) specifying whether the functional compositional data X is generated at different time points. If distrete = TRUE, generate X on dense sequence created by max(ns_dense = 200 * diff(interval), 5 * n_T) and then for each subject, randomly sample n_T points.

SNR

signal to noise ratio.

sigma

variance used to generate the covariance matrix CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma). The "non-normalized" data w_i for each subject is genearted from multivariate normal distribution with covariance CovMIX. T.Sigma and X.Sigma are correlation matrices for time points and components, respectively.

Nzero_group

an even integer specifying that the first Nzero_group compositional predictors are with non-zero effects. Default is 4.

rho_X, rho_T

parameters used to generate correlation matrices.

Corr_X, Corr_T

character string specifying correlation structure bewteen components and between time points, respectively.

  • "CorrCS"(Default for Corr_X) compound symmetry.

  • "CorrAR"(Default for Corr_T) autoregressive.

range_beta

a sorted vector of length 2, specifying the range of coefficient matrix B of demension p*k. Specifically, each column of B is filled with Nzero_group/2 values from the unifom distribution over range_beta and their negative counterparts. Default is c(0.5, 1).

beta_c

value of coefficients for beta0 and beta_c (coefficients for intercept and time-invariant predictors). Default is 1.

beta_C

vectorized coefficient matrix. If missing, the program will generate beta_C according to range_beta and Nzero_group.

theta.add

logical or integer(s).

  • If integer(s), a vector with value(s) in [1,p], indicating which component(s) of compostions is of high level mean curve.

  • If TRUE, the components c(1:ceiling(Nzero_group/2) and Nzero_group + (1:ceiling(Nzero_group/2))) are set to with high level mean.

  • if FALSE, all mean curves are set to 0's.

gamma

for the high-level mean groups, log(p * gamma) is added on the "non-normalized" data w_i before the data are converted to be compositional.

basis_beta, df_beta, degree_beta

basis_fun, k and degree in FuncompCGL respectively.

insert

a character string sepcifying method to perform functional interpolation.

  • "FALSE"(Default) no interpolation.

  • "X" linear interpolation of functional compositional data along the time grid.

  • "basis" the functional compositional data is interplolated as a step function along the time grid.

If insert = "X" or "basis", interplolation is conducted on sseq, where sseq is the sorted sequence of all the observed time points.

method

a character string sepcifying method used to approximate integral.

  • "trapezoidal"(Default) Sum up areas under the trapezoids.

  • "step" Sum up area under the rectangles.

Details

The setup of this simulation follows Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.
Specifically, we first generate correlation matrix X.sigma for components of a composition based on rho_X and Corr_X, and correlation matrix T.sigma for time points based on rho_T and Corr_T. Then, the "non-normalized" data w_i=[w_i(t_1)^T,...,w_i(t_{n_T})^T] for each subject are generated from multivariate normal distrubtion with covariance CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma), and the mean vector is determined by theta.add and gamma. Each w_i(t_v) is a p-vector for each time point v =1,...,T_n. Finally, the compositional data are obtained as

x_{ij}(t_v) = exp(w_{ij}(t_v))/sum_{k=1}^{p} exp(w_{ik}(t_v)),

for each subject i=1,...,n, component of a composition j=1,...,p and time point v=1,...,n_T.

Value

a list including

data

a list of observed data,

  • y a vector of response variable,

  • Comp a data frame of observed functional compositional data, a column of Subject_ID, and a column of TIME,

  • Zc a matrix of unpenalized variables with dimension n*m,

  • intercept whether an intercept is included.

beta

a length p*df_beta + m + 1 vector of coefficients

basis.info

matrix of the basis function to generate the coefficient curves

data.raw

a list consisting of

  • Z_t.full the functional compositional data.

  • Z_ITG integrated functional compositional data.

  • Y.tru true response vector without noise.

  • X functional "non-normalized" data W.

parameter

a list of parameters used in the simulation.

Author(s)

Zhe Sun and Kun Chen

References

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics

Examples

1
2
3
Data <- Fcomp_Model(n = 50, p = 30, m = 0, intercept = TRUE, Nzero_group = 4,
                    n_T = 20, SNR = 3, rho_X = 0, rho_T = 0.6,
                    df_beta = 5, obs_spar = 1, theta.add = FALSE)

jiji6454/compReg documentation built on Feb. 5, 2021, 2:20 p.m.