ncc_sample: Generate a nested case-control study

Description Usage Arguments Details Value Author(s) References

Description

ncc_sample generates a nested case-control study dataset from a cohort study dataset. Given time of entry, time of exit, and exit status, risk sets are computed at each failure time. Controls are randomly sampled from these risk sets. If matching variables are specified, ncc_sample creates a matched or stratified nested case control study, in which risk sets are computed separately within matching strata. ncc_sample is similar to ccwc from the 'Epi' package, but differs in several small but important ways (see details).

Usage

1
2
3
ncc_sample(entry = 0, exit, fail, origin = 0, controls = 1,
  match = list(), include = list(), data = NULL, keep_all = FALSE,
  silent = FALSE)

Arguments

entry

time of entry to follow-up

exit

time of exit from follow-up

fail

indicator of status on exit from follow-up (censored=0, fail=1)

origin

the origin of the analysis time-scale. For instance, date of birth, if age is the desired time-scale.

controls

the number of controls to sample for each failure

match

a list of categorical variables for matching cases and controls

include

a list of variables from the cohort dataset to be carried accross the the nested case-control dataset

data

a data.frame which contains the follow-up, matching, and included variables.

keep_all

if TRUE, does not sample from the risk sets, but returns the probability of selection for each observation that is eligible to be selected for any case in data. Defaults to FALSE.

silent

if FALSE, provides entertainment by echoing a fullstop to the screen as each risk set is generated. If TRUE, output to the console is suppressed.

Details

Given follow-up information from a cohort study, ncc_sample generates risk sets at each observed failure time, and randomly samples controls from these risk sets without replacement. Functionality is much the same as ccwc from the 'Epi' package, with two minor differences. Firstly, ncc_sample also computes and returns the total number of eligible controls for each risk set, as well as the probability of selection in to the sample for every selected case and control. The latter is calculated according to the formula given by Samuelsen (1997). Secondly, ncc_sample splits tied failure times at random, whereas ccwc preserves the ties and returns a multi-case case-control set.

Random sampling of controls within risk sets is performed using R's pseudo-random number facilities. It is therefore important to set the seed (set.seed) to ensure reproducibility.

Value

A data.frame comprising:

ncc_set

a case-control set identifier

ncc_id

a unique individual identifier

ncc_fail

case identifier (0=control, 1=case)

ncc_elig_co

a count of the number of controls eligible for selection in the set

ncc_time

failure time of the case in the set

ncc_pr

the probability of being selected in the nested case-control sample

followed by the variables specified in the match and include lists.

Author(s)

David C Muller

References

Samuelsen S. O. (1997). A psudolikelihood approach to analysis of nested case-control studies. Biometrika, 84(2), 379-394. doi:10.1093/biomet/84.2.379


dcmuller/ncctools documentation built on May 20, 2019, 2:20 p.m.