incidenceMatch | R Documentation |
A case is a person who has an outcome event at the index date. A control is a person who does not yet have the event
at the index date of the case. See exposureMatch
for situations where treatment start or onset of
a comorbidity define the case and the index date.
Risk set matching or incidence density sampling for nested case-control designs targets a a Cox regression model. with time-dependent covariates. In large-scale registry data the main purpose for risk set matching, instead of standard Cox regression on all data, is to save computation time. See Details and references.
Note that the parameter estimates of a conditional logistic regression analysis applied to the
output of incidenceMatch
are hazard ratios which should be interpreted in terms of a Cox regression
model.
To provide necessary speed for large samples the general technique used is work with data.table and to create a series of match groups that have the fixed matching variables identical (such as birthyear and gender).
incidenceMatch(
ptid,
event,
terms,
data,
n.controls,
case.index = NULL,
end.followup = NULL,
date.terms = NULL,
duration.terms = NULL,
output.count.controls = TRUE,
cores = 1,
seed = 0,
progressbar = TRUE
)
ptid |
Personal ID variable defining participant |
event |
Name of variable that defines cases. MUST be numeric 0/1 where 0 codes for never-case, and 1 for case. |
terms |
Vector of variable names specifying the variables that should be matched on. Make sure that appropriate classification is in place for truly continuous variables, such as age. This is to ensure a sufficient number of controls for each case. For example it may be difficult to find controls for cases of very high and very low ages and extreme ages should therefor further aggregated. |
data |
The single dataset with all information - coerced to data.table if data.frame |
n.controls |
Number of controls for each case |
case.index |
Name of the variable which contains the case index dates. This can be a calendar date variable or a numeric variable, i.e., the time to outcome event from a well defined baseline date. Missing values are interpreted as no event at the end of followup. |
end.followup |
Name of the variable which defines the date (as date or time) from which a control can no longer be selected due to
The end.followup must be larger or equal to
the |
date.terms |
Unclear if useful in this context. But, see description
for |
duration.terms |
A list where each element defines a time duration term with two elements:
Useful to prepare to summarize the history of exposure for cases and controls in an equally long period
looking back in time from the |
output.count.controls |
Logical. If |
cores |
number of cores to use in the calculation. |
seed |
Random seed to make results reproducible |
progressbar |
set to |
The function performs exact matching and hence all matching variables must be factor variables or character.
It may appear tempting always to use multiple cores, but this comes at the cost of copying the data to the cores.
This function prepares the data for fitting a Cox regression model via survival::clogit
or directly
via survival::coxph
or equivalent routine. The regression parameters are hazard ratios.
The matching variables are allowed to have a time-dependent non-proportional
effect on the hazard rate of the outcome very much in the same way as would be obtained without matching
by a strata statement to stratify the baseline hazard function. The original motivation for the nested case-control
design is when it is difficult, expensive or time-consuming to measure the exposure variables.
The function matchReport may afterwards be used to provide simple summaries of use of cases and controls
data.table with cases and controls. After matching, a the variable "case.id" identifies sets which include 1 case and x matched controls.
Variables in the original dataset are preserved. The final dataset includes all original cases but only the controls that were selected.
Christian Torp-Pedersen & Thomas Alexander Gerds
Bryan Langholz and Larry Goldstein. Risk set sampling in epidemiologic cohort studies. Statistical Science, pages 35–53, 1996.
Ornulf Borgan, Larry Goldstein, Bryan Langholz, et al. Methods for the analysis of sampled cohort data in the cox proportional hazards model. The Annals of Statistics, 23(5):1749–1778, 1995.
Vidal Essebag, Robert W Platt, Michal Abrahamowicz, and Louise Pilote. Comparison of nested case-control and survival analysis methodologies for analysis of time-dependent exposure. BMC medical research methodology, 5 (1):5, 2005.
exposureMatch clogit matchReport Matchit
require(data.table)
case <- c(rep(0,40),rep(1,15))
ptid <- paste0("P",1:55)
sex <- c(rep("fem",20),rep("mal",20),rep("fem",8),rep("mal",7))
byear <- c(rep(c(2020,2030),20),rep(2020,7),rep(2030,8))
case.Index <- c(seq(1,40,1),seq(5,47,3))
startDisease <- rep(10,55)
control.Index <- case.Index
diabetes <- seq(2,110,2)
heartdis <- seq(110,2,-2)
diabetes <- c(rep(1,55))
heartdis <- c(rep(100,55))
library(data.table)
dat <- data.table(case,ptid,sex,byear,diabetes,heartdis,case.Index,
control.Index,startDisease)
# Risk set matching
matchdat <- incidenceMatch(ptid="ptid",event="case",
terms=c("byear","sex"),data=dat,n.controls=2,
case.index="case.Index",
end.followup="control.Index",seed=8)
matchdat
matchReport(matchdat)
# Same with 2 cores
library(parallel)
library(foreach)
## Not run:
matchdat2 <- incidenceMatch("ptid","case",c("byear","sex"),data=dat,
n.controls=2,case.index="case.Index",end.followup="control.Index"
,cores=2,seed=8)
matchdat2
all.equal(matchdat,matchdat2)
## End(Not run)
# Case control matching with requirement of minimum exposure time in each
# group
ew <- incidenceMatch(ptid="ptid",event="case",terms=c("byear","sex"),
data=dat,n.controls=2,case.index="case.Index",
end.followup="control.Index",cores=1,
duration.terms=list(list(start="startDisease",min=15)))
ew
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.