ncc_sample: Generate a nested case-control study
In dcmuller/ncctools: Tools for nested case-control data

Description Usage Arguments Details Value Author(s) References

ncc_sample generates a nested case-control study dataset from a cohort study dataset. Given time of entry, time of exit, and exit status, risk sets are computed at each failure time. Controls are randomly sampled from these risk sets. If matching variables are specified, ncc_sample creates a matched or stratified nested case control study, in which risk sets are computed separately within matching strata. ncc_sample is similar to ccwc from the 'Epi' package, but differs in several small but important ways (see details).

1
2
3

ncc_sample(entry = 0, exit, fail, origin = 0, controls = 1,
  match = list(), include = list(), data = NULL, keep_all = FALSE,
  silent = FALSE)

`entry`	time of entry to follow-up
`exit`	time of exit from follow-up
`fail`	indicator of status on exit from follow-up (censored=0, fail=1)
`origin`	the origin of the analysis time-scale. For instance, date of birth, if age is the desired time-scale.
`controls`	the number of controls to sample for each failure
`match`	a list of categorical variables for matching cases and controls
`include`	a list of variables from the cohort dataset to be carried accross the the nested case-control dataset
`data`	a `data.frame` which contains the follow-up, matching, and included variables.
`keep_all`	if `TRUE`, does not sample from the risk sets, but returns the probability of selection for each observation that is eligible to be selected for any case in `data`. Defaults to `FALSE`.
`silent`	if `FALSE`, provides entertainment by echoing a fullstop to the screen as each risk set is generated. If TRUE, output to the console is suppressed.

Given follow-up information from a cohort study, ncc_sample generates risk sets at each observed failure time, and randomly samples controls from these risk sets without replacement. Functionality is much the same as ccwc from the 'Epi' package, with two minor differences. Firstly, ncc_sample also computes and returns the total number of eligible controls for each risk set, as well as the probability of selection in to the sample for every selected case and control. The latter is calculated according to the formula given by Samuelsen (1997). Secondly, ncc_sample splits tied failure times at random, whereas ccwc preserves the ties and returns a multi-case case-control set.

Random sampling of controls within risk sets is performed using R's pseudo-random number facilities. It is therefore important to set the seed (set.seed) to ensure reproducibility.

A data.frame comprising:

`ncc_set`	a case-control set identifier
`ncc_id`	a unique individual identifier
`ncc_fail`	case identifier (0=control, 1=case)
`ncc_elig_co`	a count of the number of controls eligible for selection in the set
`ncc_time`	failure time of the case in the set
`ncc_pr`	the probability of being selected in the nested case-control sample

followed by the variables specified in the match and include lists.

David C Muller

Samuelsen S. O. (1997). A psudolikelihood approach to analysis of nested case-control studies. Biometrika, 84(2), 379-394. doi:10.1093/biomet/84.2.379

dcmuller/ncctools documentation built on May 20, 2019, 2:20 p.m.