match_on_index: Function for performing matching of controls to cases using...

Description Usage Arguments Details Value

Description

Controls are matched on an arbitrary number of categrorical variables and on continuous variables via the extra_conditions argument. Also the date at index_var is matched to the eventdate in the consultation files, providing a dummy index date for controls of a consultaton within +/- index_diff_limit days of the index date.

Usage

1
2
3
match_on_index(cases, control_pool, index_var, match_vars,
  extra_conditions = "", index_diff_limit = 90, consult_path,
  n_controls = 5, cores = 1, import_fn = read.delim, ...)

Arguments

cases

A dataframe of cases to which to match controls

control_pool

A dataframe of possible contols to match to cases

index_var

character string of the name of the variable containing index dates

match_vars

character vector detailing the common variables in cases and control_pool to match on

extra_conditions

character string detailing other matching constraints (see details)

index_diff_limit

integer number of days before or after the case index date that dummy index dates can be picked from the consultation files

consult_path

path to directory containing consultation files

n_controls

integer the number of controls to attempt to match to each case

cores

integer the number of processor cores to be used in processing

import_fn

function name stipulating the function used to read the consultation files

...

extra arguments to be passed to import_fn

Details

Note that the consultaton files must be in flat-file format (i.e. not as part of the database, but as text (or other filetype, e.g stata dta) files). Set the import_fn argument to use different file formats (e.g. foreign::read.dta or readstata13::read.dta13)

The extra_conditions argument can add extra condtions to the matching criteria on top of the matching vars for example you could add "year > 1990". You can wrap calls to expressions in dotted brackets to automatically expand them. This is particularly useful when you want to find the value for each individual case. Each case is denoted by CASE e.g. "start_date < .(CASE$start_date)" will ensure the start date for controls is prior to the start date for the matched case.

Value

a dataframe of matched controls


rosap/test documentation built on May 27, 2019, 11:30 p.m.