generate.data.for.demonstration: Generate data for some demonstration examples
In funreg: Functional Regression for Irregularly Timed Data

View source: R/GenerateDataForDemonstration.r

Simulates a dataset with two functional covariates, four subject-level scalar covariates, and a binary outcome.

generate.data.for.demonstration(
  nsub = 400,
  b0.true = -5,
  b1.true = 0,
  b2.true = +1,
  b3.true = -1,
  b4.true = +1,
  nobs = 500,
  observe.rate = 0.1
)

`nsub`	The number of subjects in the simulated dataset.
`b0.true`	The true value of the intercept.
`b1.true`	The true value of the first covariate.
`b2.true`	The true value of the second covariate.
`b3.true`	The true value of the third covariate.
`b4.true`	The true value of the fourth covariate.
`nobs`	The total number of possible observation times.
`observe.rate`	The average proportion of those possible times at which any given subject is observed.

Returns a data.frame representing nobs measurements for each subject. The rows of this data.frame tell the values of two time-varying covariates on a dense grid of nobs observation times. It also contains an id variable, four subject-level covariates (s1, ..., s4) and one subject-level response (y), which are replicated for each observation. For each observation, there is also its observation time time, there are both the smooth latent value of the covariates (true.x1 and true.x2) and versions observed with error (x1 and x2), and there are also the local values of the functional regression coefficients (true.betafn1 and true.betafn2). Lastly, each row has a random value for include.in.subsample, telling whether it should be considered as an observed data point (versus an unobserved moment in the simulated subject's life). include.in.subsample is simply generated as a Bernoulli random variable with success probability observe.rate.

nobs is the number of simulated data rows per simulated subject. It should be selected to be large because x covariates are conceptually supposed to be smooth functions of time. However, in the simulated data analyses we actually only use a small random subset of the generated time points, because this is more realistic for many behavioral and medical science datasets. Thus, the number of possible observation times per subject is nobs, and the mean number of actual observation times per subject is nobs times observe.rate. This smaller 'observed' dataset can be obtained by deleting from the dataset those observations having include.in.subsample==FALSE.