Nonlinear...
In MEGB: Gradient Boosting for Longitudinal Data

View source: R/MEGB.R

simLong

R Documentation

Simulate Low/High Dimensional and Linear/Nonlinear Longitudinal dataset.

Description

Simulate p-dimensional linear/Nonlinear mixed-effects model given by:

Y_i(t)=f(X_i(t))+Z_i(t)\beta_i+\epsilon_i

with Y_i(t) the output at time t for the ith individual; X_i(t) the input predictors (fixed effects) at time t for the ith individual; Z_i(t) are the random effects at time t for the ith individual; \epsilon_i is the residual error with variance \sigma^2. If linear, f(X_i(t)) = X_i(t)\theta, where \theta = 1, \forall p, otherwise if nonlinear, the approach by Capitaine et al. (2021) is adapted.

Usage

simLong(
  n,
  p,
  rel_p = 6,
  time_points,
  rho_W = 0.5,
  rho_Z = 0.5,
  random_sd_intercept = 2,
  random_sd_slope = 1,
  noise_sd = 1,
  linear = TRUE
)

Arguments

`n`	[numeric]: Number of individuals.
`p`	[numeric]: Number of predictors.
`rel_p`	[numeric]: Number of relevant predictors (true predictors that are correlated to the outcome.). The default value is `rel_p=6` if linear and `rel_p=2` if nonlinear.
`time_points`	[numeric]: Number of realizations per individual. The default value is `time_points=10`.
`rho_W`	[numeric]: Within subject correlation. The default value is `rho_W=0.5`.
`rho_Z`	[numeric]: Correlation between intercept and slope for the random effect coefficients. The default value is `rho_Z=0.5`.
`random_sd_intercept`	[numeric]: Standard deviation for the random intercept. The default value is `random_sd_intercept=\sqrt{0.5}`.
`random_sd_slope`	[numeric]: Standard deviation for the random slope. The default value is `random_sd_slope=\sqrt{3}`.
`noise_sd`	[numeric]: Standard deviation for the random slope. The default value is `noise_sd=0.5`.
`linear`	[boolean]: If TRUE, a linear mixed effect model is simulated, if otherwise, a semi-parametric model similar to the one used in Capitaine et al. (2021).

Value

a dataframe of dimension (n*time_points) by (p+5) containing the following elements:

id: vector of the individual IDs.
time: vector of the time realizations.
Y: vector of the outcomes variable.
RandomIntercept: vector of the Random Intercept.
RandomSlope: vector of the Random Slope.
Vars : Remainder columns corresponding to the fixed effect variables.

Examples

set.seed(1)
data = simLong(n = 17,p = 6,rel_p = 6,time_points = 10,rho_W = 0.6, rho_Z=0.6,
              random_sd_intercept = sqrt(0.5),
              random_sd_slope = sqrt(3),
              noise_sd = 0.5,linear=FALSE) # Generate the data
head(data)   # first six rows of the data.
# Let's see the output :
w <- which(data$id==1)
plot(data$time[w],data$Y[w],type="l",ylim=c(min(data$Y),max(data$Y)), col="grey")
for (i in unique(data$id)){
  w <- which(data$id==i)
  lines(data$time[w],data$Y[w], col='grey')
}
# Let's see the fixed effects predictors:
oldpar <- par(no.readonly = TRUE)
oldopt <- options()
par(mfrow=c(2,3), mar=c(2,3,3,2))
for (i in 1:ncol(data[,-1:-5])){
  w <- which(data$id==1)
  plot(data$time[w],data[,-1:-5][w,i], col="grey",ylim=c(min(data[,-1:-5][,i]),
  max(data[,-1:-5][,i])),xlim=c(1,max(data$time)),main=latex2exp::TeX(paste0("$X^{(",i,")}$")))
  for (k in unique(data$id)){
    w <- which(data$id==k)
    lines(data$time[w],data[,-1:-5][w,i], col="grey")
  }
}
par(oldpar)
options(oldopt)

MEGB documentation built on April 4, 2025, 2:59 a.m.