addCodelistToCohort: Adds information from categories based on a codelist

Description Usage Arguments Details Value See Also Examples

View source: R/addCodelistToCohort.R

Description

Adds to the cohort data.table a column labelled varname containing the value of a category from a list of anonpatid, category, eventdate. If a patient has more than one record, the category of choice is chosen according to an order of priority, or a binary outcome is generated if the patient has any records fulfilling the criteria.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
addCodelistToCohort(cohort, varname, data, codelist,
    categories = NULL, enttypes = NULL,
    codename = switch(attr(codelist, 'Source'),
    GPRD = 'medcode', ONS = 'cod', HES = 'icd', OPCS = 'opcs',
    GPRDPROD = 'prodcode'), binary = FALSE,
    limit_years = c(-Inf, 0),  idcolname = attr(cohort, 'idcolname'),
    datecolname = c('eventdate', 'evdate', 'dod', 'epistart',
    'admidate'), indexcolname = 'indexdate',
    overwrite = TRUE, description = NULL, limit_days = NULL,
    entities = 'all')

Arguments

cohort

a cohort object

varname

new variable name

data

ffdf or data.table containing anonpatid, medcode and eventdate and enttype if entity type is specified

codelist

a codelist object or the name of a codelist file for selection of events by medcode.

categories

vector of categories to use, in priority order (highest priority first). If the result is binary, the order of categories does not matter. Leave as NULL (default) to include all categories.

enttypes

numeric vector of entity types to use, default = NULL (use all)

codename

name of the column in data containing the relevant medical code

binary

whether to lump all categories together to make a binary variable

limit_years

earliest and latest year relative to index date. Default is c(-Inf, 0), which selects all events prior to or on the index date.

idcolname

name of the patient identifier column in data

datecolname

name of the event date column in data. If multiple options are given, they are tried in turn to find a column that contains dates.

indexcolname

name of the index date column in the cohort dataset x

overwrite

whether to overwrite this variable if it exists

description

description for the new variable. Defaults to the function call which generated this variable.

limit_days

a vector of length 2 for the time limits, which over-rules limit_years if both are supplied. A year is considered to be 365.25 days.

entities

deprecated way to specify which entity types to use, use enttypes instead.

Details

The function first merges the codelist onto data, to select events with medcodes in the relevant categories and entity types. The subset of relevant events is then used in a call to addToCohort, to select a category if it occurs within the defined time window relative to each patient's index date. The final variable is added to the cohort data.table.

Value

Cohort with a extra column(s). If cohort is a data.table, it is also modified by reference.

See Also

addToCohort

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
COHORT <- cohort(data.table(anonpatid = 1:3,
    indexdate = as.IDate(c("2012-1-3", "2012-1-2", "2010-1-9"))))
print(COHORT)

# New data (existing category column should be ignored)
newdata <- data.table(anonpatid = c(2, 2, 3, 3, 4, 4, 4), 
    medcode = c(1, 2, 2, 3, 1, 2, NA), eventdate = as.IDate(c("2000-1-1", 
    "2012-1-3", "2011-1-1", "2011-1-1", "2012-1-5", "2013-1-1", 
    "2011-1-1")), category = rep(2, 7))

# Create a sample codelist
mycodelist <- data.table(medcode = 1:4,
	code = c('246..00', '22A..00', '229..00', '423..00'),
    term = c('O/E - weight', 'O/E - blood pressure reading',
    	'O/E - height', 'Haemoglobin estimation'), category = c(1, 2, 2, 2))
setattr(mycodelist, 'Source', 'GPRD')
setattr(mycodelist, 'Categories', data.table(category = 1:2,
    shortname = c('Weight', 'Other')))
class(mycodelist) <- c('codelist', 'data.table', 'data.frame')

# Using data.table, categories 1 or 2 (1 priority)
addCodelistToCohort(COHORT, varname = "newvar", data = newdata,
    codelist = mycodelist, categories = 1:2)
print(COHORT)
removeColumns(COHORT, 'newvar')

# Using ffdf
newffdf <- as.ffdf(newdata)
FFDFCOHORT <- as.ffdf(COHORT)

# Category 1 only
addCodelistToCohort(COHORT, varname = "V_1", data = newffdf,
    codelist = mycodelist, categories = 1)

# Category 2 or 1
addCodelistToCohort(COHORT, varname = "V_12", data = newffdf,
    codelist = mycodelist, categories = 2:1)

# Binary
addCodelistToCohort(COHORT, varname = "V_binary", data = newffdf,
    codelist = mycodelist, categories = 1:2, binary = TRUE)

# Category 2 or 1, no time limits
addCodelistToCohort(COHORT, varname = "V_12anytime", data = newffdf,
    codelist = mycodelist, categories = 2:1, limit_years = c(-Inf, Inf))
print(COHORT)

# Using FFDF cohort; need to reassign the result to the 
# cohort object using <-
# Category 1 only
FFDFCOHORT <- addCodelistToCohort(FFDFCOHORT, varname = "V_1", data = newffdf,
    codelist = mycodelist, categories = 1)

# Category 2 or 1
FFDFCOHORT <- addCodelistToCohort(FFDFCOHORT, varname = "V_12",
    data = newffdf, codelist = mycodelist, categories = 2:1)
print(FFDFCOHORT)

CALIBERdatamanage documentation built on Nov. 23, 2021, 3 p.m.