Generate data for some demonstration examples

Share:

Description

Simulates a dataset with two functional covariates, four subject-level scalar covariates, and a binary outcome.

Usage

1
2
3
generate.data.for.demonstration(nsub = 400, b0.true = -5, b1.true = 0,
  b2.true = +1, b3.true = -1, b4.true = +1, nobs = 500,
  observe.rate = 0.1)

Arguments

nsub

The number of subjects in the simulated dataset.

b0.true

The true value of the intercept.

b1.true

The true value of the first covariate.

b2.true

The true value of the second covariate.

b3.true

The true value of the third covariate.

b4.true

The true value of the fourth covariate.

nobs

The total number of possible observation times.

observe.rate

The average proportion of those possible times at which any given subject is observed.

Value

Returns a data.frame representing nobs measurements for each subject. The rows of this data.frame tell the values of two time-varying covariates on a dense grid of nobs observation times. It also contains an id variable, four subject-level covariates (s1, ..., s4) and one subject-level response (y), which are replicated for each observation. For each observation, there is also its observation time time, there are both the smooth latent value of the covariates (true.x1 and true.x2) and versions observed with error (x1 and x2), and there are also the local values of the functional regression coefficients (true.betafn1 and true.betafn2). Lastly, each row has a random value for include.in.subsample, telling whether it should be considered as an observed data point (versus an unobserved moment in the simulated subject's life). include.in.subsample is simply generated as a Bernoulli random variable with success probability observe.rate.

Note

nobs is the number of simulated data rows per simulated subject. It should be selected to be large because x covariates are conceptually supposed to be smooth functions of time. However, in the simulated data analyses we actually only use a small random subset of the generated time points, because this is more realistic for many behavioral and medical science datasets. Thus, the number of possible observation times per subject is nobs, and the mean number of actual observation times per subject is nobs times observe.rate. This smaller 'observed' dataset can be obtained by deleting from the dataset those observations having include.in.subsample==FALSE.