Model2: Simulation model2 for longitudinal composition data

Description Usage Arguments Value Examples

Description

Simulate sparse observation from longitudinal compositional data X.

Usage

1
2
3
4
5
6
7
8
Model2(n, p, m = 0, intercept = TRUE, interval = c(0, 1), ns = 100,
  obs_spar = 0.6, discrete = FALSE, SNR = 1, sigma = 2, rho_X,
  Corr_X = c("CorrAR", "CorrCS"), rho_W, Corr_W = c("CorrAR", "CorrCS"),
  Nzero_group = 4, range_beta = c(0.5, 1), range_beta_c = 1, beta_C,
  theta.add = c(1, 2, 5, 6), gamma = 0.5, basis_W = c("bs", "OBasis",
  "fourier"), df_W = 5, degree_W = 3, basis_beta = c("bs", "OBasis",
  "fourier"), df_beta = 5, degree_beta = 3, insert = c("FALSE", "basis"),
  method = c("trapezoidal", "step"))

Arguments

n

sample size

p

size of compositional predictors falling in S^p

m

size of time-invariant predictors. First ceiling(m/2) columns are generated by bin(1,0.5) independently; latter (m - ceiling(m/2)) columns are generated norm(0, 1) independently.

intercept

including intercept or not to generate response variable, default is TRUE

interval

a length 2 vector indicating time domain.

ns

ns is a scaler specifying length of equally spaced time sequence on domian interval.

obs_spar

a percentage used to get sparse ovbservation. Each time point probability obs_spar to be observed. It allows different subject with different observed time points and size. obs_spar * ns > 5 is required.

discrete

is logical, specifying whether X is generated at different time points. If distrete = TRUE, generate X on dense sequence created by max(ns_dense = 1000, 5*ns) and Then for each subject, random sample ns points. recommend ns < 200 when distrete = TRUE.

SNR

signal to noise ratio.

sigma, rho_X, Corr_X, rho_W, Corr_W

linear combination scaler W, df_W*p, is generated from from Multivariate Normal Distribution with mean 0's and ovariance matrix = simga^2 * kronecker(Sigma_X, Sigma_W). Corr_X is correlation structure of Sigma_X with ρ = rho_X, which controls canonical-correlation between groups for W; Corr_W is correlation structure of Sigma_W with ρ = rho_W, which controls correlation within groups of W.

Nzero_group

a even scaler. First Nzero_group compositional predictors are considered having none zero effect, while others are with 0 coefficients.

range_beta

a sorted vector of length 2 used to generate coefficient matrix B for compositional predict, which is with demension p*k. For each column of B, generate Nzero_group/2 from unifom distribution with range range_beta, and together with their negatives are ramdom assigned to the first Nzero_group rows.

range_beta_c

value of coefficients for beta0 and beta_c (coefficients for time-invariant predictors)

beta_C

vectorized coefficients of coefficient matrix for compositional predictors. Could be missing.

theta.add

logical or numerical. If numerical, indicating which ones of compositional predictors of high level mean. If logical, c(1:ceiling(Nzero_group/2), Nzero_group + (1:ceiling(Nzero_group/2))) are set to with high level mean

gamma

high level mean groups adding log(p * gamma) before convertint into compositional data, otherwise 0.

basis_W, df_W, degree_W

longitudinal compositional data is generated from linear combination of basis Ψ(t), take exponetial and change into compositional data.

  • basis_W is the basis function for Ψ(t) - default is "bs". Other choise are "OBasis" and "fourier";

  • df_W is the degree of freedom for basis Ψ(t) - default is 10 ;

  • degree_W the is degree for Ψ(t) - default is 3.

basis_beta, df_beta, degree_beta

coefficinet curve is generate by linear combination of basis Φ(t).

  • basis_beta is the basis function for Φ(t) - default is "bs". Other choise are "OBasis" and "fourier";

  • df_beta is the degree of freedom for basis Φ(t) - default is 5;

  • degree_beta is the degree for Φ(t) - default is 3.

insert

way to interpolation.

  • "FALSE" no interpolation.

  • "basis" compositional data is considered as step function, imposing basis on un-observed time points for each subject.

Default is "FALSE".

method

method used to approximate integral.

  • "trapezoidal" Sum up area under trapezoidal formulated by values of function at two adjacent observed time points. See ITG_trap.

  • "step" Sum up area under rectangle formulated by step function at observed time points. See ITG_step.

Default is "trapezoidal"

Value

a list

data

a list, a vector y of response variable, a data frame Comp of sparse observation of longitudinal compositional data, a matrix Zc of time-invariable predictors, a logtical intercept.

beta

a length p*df_beta + m + 1 vector for coefficients

basis.info

matrix for basis for beta, combining the first column as time sequence.

data.raw

a list, Z_t.full full observation of longitudinal compositional data, Z_ITG integral for full observation of longitudinal compositional data, Y.tru true response without noise, X longitudinal before converting into compositional data, W matrix of linear combination scalers, n * (df_W * p).

parameter

a list of parameters.

Examples

1
2
3
4
5
6
7
8
df_beta = 5
p = 20
Data <- Model2(n = 50, p = p, m = 2, intercept = TRUE, ns = 50, SNR = 1,
              rho_X = 0.1, rho_W = 0.2, df_W = 10, df_beta = df_beta, obs_spar = 0.5)
names(Data$data)
Data.test <- Model2(n = 50, p = p, m = 2, intercept = TRUE, ns = 50, SNR = 1,
                   rho_X = 0.1, rho_W = 0.2, df_W = 10, df_beta = df_beta, obs_spar = 0.5,
                   beta_C = Data$beta[1:(p*df_beta)])

Zhe-Research/compReg documentation built on May 28, 2019, 8:38 a.m.