Simulate data including multiple outcomes from error-prone diagnostic tests or self-reports

Description

This function simulates a data of N subjects with misclassified outcomes, assuming each subject receives a sequence of pre-scheduled tests for disease status ascertainment. Each test is subject to error, characterized by sensitivity and specificity. An exponential distribution is assumed for the time to event of interest. Three kinds of covariate settings can be generated: one sample setting, two group setting, and continuous covariates setting with each covariate sampled from i.i.d. N(0, 1). Two missing mechanisms can be assumed, namely MCAR and NTFP. The MCAR setting assumes that each test is subject to a constant, independent probability of missingness. The NTFP mechanism includes two types of missingness - (1) incorporates a constant, independent, probability of missing for each test prior to the first positive test result; and (2) all test results after first positive are missing. The simulated data is in longitudinal form with one row per test time.

Covariate values, by default, are assumed to be constant. However, this function can simulate a special case of time varying covariates. Under time varying covariates setting, each subject is assumed to have a change time point, which is sampled from the visit times. We assume that each subject has two sets of covariate values. Before his change time point, the covariate values take from the first set, and second set after change time point. Thus, each subject's distribution of survival time is two-piece exponential distribution with different hazard rates.

Usage

1
2
3
datasim(N, blambda, testtimes, sensitivity, specificity, betas = NULL,
  twogroup = NULL, pmiss = 0, pcensor = 0, design = "MCAR",
  negpred = 1, time.varying = F)

Arguments

N

total number of subjects to be simulated

blambda

baseline hazard rate

testtimes

a vector of pre-scheduled test times

sensitivity

the sensitivity of test

specificity

the specificity of test

betas

a vector of regression coefficients of the same length as the covariate vector. If betas = NULL then the simulated dataset corresponds to the one sample setting. If betas != NULL and twogroup != NULL then the simulated dataset corresponds to the two group setting, and the first value of betas is used as the coefficient for the treatment group indicator. If betas != NULL and twogroup = NULL, then the covariates are ~ i.i.d. N(0, 1), and the number of covariates is determined by the length of betas.

twogroup

corresponds to the proportion of subjects allocated to the baseline (reference) group in the two-group setting. For the two-group setting, this variable should be between 0 and 1. For the one sample and multiple (>= 2) covariate setting, this variable should be set to NULL. That is, when betas !=NULL, set twogroup to equal the proportion of the subjects in the baseline group to obtain a simulated dataset corresponding to the two-group setting. Else, set twogroup=NULL to obtain either the one sample setting (betas=NULL) or continuous covariates (betas !=NULL).

pmiss

a value or a vector (must have same length as testtimes) of the probabilities of each test being randomly missing at each test time. If pmiss is a single value, then each test is assumed to have an identical probability of missingness.

pcensor

a value or a vector (must have same length as testtimes) of the probability of censoring at each visit, assuming censoring process is independent on other missing mechanisms.

design

missing mechanism: "MCAR" or "NTFP"

negpred

baseline negative predictive value, i.e. the probability of being truely disease free for those who were tested (reported) as disease free at baseline. If baseline screening test is perfect, then negpred = 1.

time.varying

indicator whether fitting a time varying covariate model or not

Details

To simulate the one sample setting data, set betas to be NULL. To simulate the two group setting data, set twogroup to equal the proportion of the subjects in the baseline group and set betas to equal the coefficient corresponding to the treatment group indicator(i.e. beta equals the log hazard ratio of the two groups). To simulate data with continuous i.i.d. N(0, 1) covariates, set twogroup to be NULL and set betas to equal the vector of coefficients of the covariates.

Value

simulated longitudinal form data frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## One sample setting
simdata1 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = NULL, twogroup = NULL, pmiss = 0.3, design = "MCAR")

## Two group setting, and the two groups have same sample sizes
simdata2 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = 0.7, twogroup = 0.5, pmiss = 0.3, design = "MCAR")

## Three covariates with coefficients 0.5, 0.8, and 1.0
simdata3 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 1)

## NTFP missing mechanism
simdata4 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "NTFP", negpred = 1)

## Baseline misclassification
simdata5 <- datasim(N = 2000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 0.97)

## Time varying covariates
simdata6 <- datasim(N = 1000, blambda = 0.05, testtimes = 1:8, sensitivity = 0.7,
  specificity = 0.98, betas = c(0.5, 0.8, 1.0), twogroup = NULL, pmiss = 0.3,
  design = "MCAR", negpred = 1, time.varying = TRUE)