riskSetMatch: riskSetMatch - Risk set matching

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Risk set matching is common term to represent "incidence density sampling" or "exposure density sampling". In both cases the request is to match by a series of variables such that the outcome or exposure data of "controls" are later than the outcome or exposure for cases.

The current program is based on exact matching and allows the user to specify a "greedy" approach where controls are only used once as well as allowing the program to reuse controls and to allow cases to be controls prior to being a case.

For the common use in nested case control studies it is important to specify that controls can be reused and cases can appear as controls prior to being a control.

In addition to the exact matching the function and also only select controls where time of covariates are missing for both cases and controls, are both before case index, are both after case index - or are missing for case index and after case index for controls.

Usage

1
2
3
4
riskSetMatch(ptid,event,terms,dat,Ncontrols,oldevent="oldevent"
   ,caseid="caseid",reuseCases=TRUE,reuseControls=TRUE,caseIndex=NULL 
   ,controlIndex=NULL,NoIndex=FALSE,cores=1,dateterms=NULL,
   exposureWindow=0,startDate=NULL,SEED=17)

Arguments

ptid

Personal ID variable defining participant

event

Defining cases/controls MUST be integer 0/1 - 0 for controls, 1 for case

terms

c(.....) Specifies the variables that should be matched by - enclosed in ".."

dat

The single dataset with all information - coerced to data.table if data.frame

Ncontrols

Number of controls sought for each case

oldevent

Holds original value of event - distinguishes cases used as controls

caseid

Character. Variable holding grouping variable for cases/controls (=case-ptid)

reuseCases

Logical. If TRUE a case can be a control prior to being a case

reuseControls

Logical. If TRUE a control can be reused for several cases

caseIndex

Integer/Date. Date variable defining the date where a case becomes a case. For a case control study this is the date of event of interest, for a cohort study the date where a case enters an analysis.

controlIndex

Integer/Date. Date variable defining the date from which a controls can no longer be selected. The controlIndex must be larger than the caseIndex. For a case control study this would be the date where a control has the event of interest or is censored. For a cohort study it would be the date where the control disappears from the analysis, e.g. due to death or censoring.

NoIndex

Logical. If TRUE caseIndex/controlIndex are ignored

cores

number of cores to use, default is one

dateterms

c(....) A list of variable neames (character) in "dat" specifying dates of conditions. When a list is specified it is not only checked that the caseIndex is not after controlIndex, but also that for all variables in the list either both control/case dates are missing, both prior to case index, both after case index - or missing for case and with control date after case index.

exposureWindow

For case/control studies this can be specified to ensure that controls have a minimum expore window for some condition which starts at the following variable "startDate". For practical use only cases with a certain exposure window are first selected and then this feature is used to ensure similar exposure for controls.

startDate

Starting date of condition which defines exposure window

SEED

- Seed for random shuffling of cases

Details

The function does exact matching and keeps 2 dates (indices) apart such that the date for controls is larger than that for cases. Because the matching is exact all matching variables must be integer or character. Make sure that sufficient rounding is done on continuous and semicontinuous variables to ensure a decent number of controls for each case. For example it may be difficult to find controls for cases of very high age and age should therefore often be rounded by 2,3 or 5 years - and extreme ages further aggregated.

For case control studies age may be a relevant matching parameter - for most cohort studies year of birth is more relevant since the age of a control varies with time.

Many datasets have comorbidities as time dependent variables. Matching on these requires that the comorbidity date is not (yet) reached for a corres- ponding variables for cases if the case does not have the comorbidity and similarly that the date has been reached when the case does have that co- morbidity.

For most purposes controls should be reused and if cases are not allowed to be controls prior to being a case a bias will be introduced. By default, both are set to TRUE.

For special cases it may be required that there is a minimum duration of a condition shared by cases and controls. This can be achieved with defining exposure window (same units as the various times, usually days) and startDate, the day the condition of interest starts

The function can be used for standard matching without the caseIndex/ controlIndex (with "NoIndex"), but other packages such as MatchIt are likely to be more optimal for such cases.

It may appear tempting always to use multiple cores, but this comes with a time costing overhead because the function machinery has to be distributed to each defined "worker". With very large numbers of cases and controls, multiple cores can save substantial amounts of time. When a single core is used a progress shows progress of matching. There is no progress bar with multiple cores.

The function matchReport may afterwards be used to provide simple summaries of use of cases and controls

Value

data.table with cases and controls. After matching, a new variable "caseid" links controls to cases. Further, a variable "oldevent" holds the orginal value of "event" - to be used to identify cases functioning as controls prior to being cases. Make sure that "caseid" and "oldevent" are not already in the dataset

Variables in the original dataset are preserved. The final dataset includes all original cases but only the controls that were selected.

Author(s)

Christian Torp-Pedersen

See Also

matchReport Matchit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
require(data.table)
case <- c(rep(0,40),rep(1,15)) 
ptid <- paste0("P",1:55)
sex <- c(rep("fem",20),rep("mal",20),rep("fem",8),rep("mal",7))
byear <- c(rep(c(2020,2030),20),rep(2020,7),rep(2030,8))
case.Index <- c(seq(1,40,1),seq(5,47,3))
startDisease <- rep(10,55)
control.Index <- case.Index
diabetes <- seq(2,110,2)
heartdis <- seq(110,2,-2)
diabetes <- c(rep(1,55))
heartdis <- c(rep(100,55))
library(data.table)
dat <- data.table(case,ptid,sex,byear,diabetes,heartdis,case.Index,
control.Index,startDisease)
# Very simple match without reuse - no dates to control for
out <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,NoIndex=TRUE)
out[]
# Risk set matching without reusing cases/controls - 
# Some cases have no controls
out2 <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,caseIndex="case.Index",
  controlIndex="control.Index")
out2[]   
# Risk set matching with reuse of cases (control prior to case) and reuse of 
# controls - more cases get controls
out3 <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,caseIndex=
  "case.Index",controlIndex="control.Index"
  ,reuseCases=TRUE,reuseControls=TRUE)
out3[]   
# Same with 2 cores
library(parallel)
library(foreach)
out4 <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,caseIndex=
  "case.Index",controlIndex="control.Index"
  ,reuseCases=TRUE,reuseControls=TRUE,cores=2)  
out4[]     
#Time dependent matching. In addtion to fixed matching parameters there are
#two other sets of dates where it is required that if a case has that condi-
#tion prior to index, then controls also need to have the condition prior to
#the case index to be eligible - and if the control does not have the condi-
#tion prior to index then the same is required for the control.
out5 <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,caseIndex=
  "case.Index",controlIndex="control.Index"
  ,reuseCases=TRUE,reuseControls=TRUE,cores=1,
  dateterms=c("diabetes","heartdis"))  
out5[]  
# Case control matching with requirement of minimum exposure time in each
# group 
out6 <- riskSetMatch("ptid","case",c("byear","sex"),dat,2,caseIndex=
  "case.Index",controlIndex="control.Index"
  ,reuseCases=TRUE,reuseControls=TRUE,cores=1,
  exposureWindow=15,startDate="startDisease")
out6[]  

#POSTPROCESSING
#It may be convinient to add the number of controls found to each case in or-
#der to remove cases without controls or where very few controls have been
#found.  This is easily obtained using data.table - with the example above:
#out5[,numControls:=.N,by=caseid] # adds a column with the number of controls
                                 # for each case-ID     

tagteam/heaven documentation built on June 21, 2019, 6:37 p.m.