CALIBERdatamanage-package: Data Management Tools for CALIBER Datasets
In CALIBERdatamanage: Data Management Tools for CALIBER Datasets

Description Details Author(s) References Examples

Tools for handling data in the GPRD, HES, ONS and MINAP linked dataset (CALIBER)

Package:	CALIBERdatamanage
Type:	Package
Version:	0.1-15
Date:	2021-11-22
License:	GPL-3

If you are using this package for the first time, you are advised to read the introduction and example and the introduction to data.table.

This package contains four sets of tools:

1. Importing data: Functions to import single or multiple files to data.table or ffdf objects in R, with automatic unzipping of compressed files and conversion of dates, and applying lookups. (importDT, importFFDF, extractEntity, convertDates)
2. Building cohorts: A 'cohort' S3 class to store information about a cohort, and functions for generating analysis variables from multiple row per patient data (cohort, summary.cohort, addToCohort, addCodelistToCohort)
3. Presentation tables: Producing summary tables in LaTeX or plain text, with functions to format numbers and percentages. (summaryTable)
4. Forest plot: Producing forest plots using a spreadsheet template, including the facility to include several plots side by side, and specify the formatting of text. (multiforest)

This package uses the data.table package extensively. Data tables can be modified by reference and are fast and efficient at handling large datasets. There are also functions to use ffdf data frames, which allow huge datasets to be stored in a temporary folder on the hard disk but appear as R objects in the workspace.

The package includes tools for date conversion in CALIBER files and tools for selecting values of a repeat measure or a diagnosis for patients within a particular time window.

The CALIBERlookups package, if installed, can provide lookup tables for the function extractEntity. The CALIBERcodelists package is useful for creating codelists, but is not required for this package to work.

Anoop Shah

Denaxas et al. Data Resource Profile: Cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int. J. Epidemiol. (2012) 41 (6): 1625-1638. doi: 10.1093/ije/dys188 http://ije.oxfordjournals.org/content/41/6/1625

# A sample patient cohort file
mycohort <- cohort(data.table(anonpatid = 1:3,
    indexdate = c('2010-01-01', '2009-03-05', '2008-05-06'),
    deathdate = c(NA, '', '2009-09-08'),
    ethnic_hes = c('Black', 'White', 'Indian')))
convertDates(mycohort)
print(mycohort)

# A sample data file with repeat measures for some patients
mydata <- data.table(anonpatid = c(2, 2, 3),
    eventdate = as.IDate(c('2006-01-01', '2008-01-01', '2005-01-01')),
    data1 = c(1, 2, 3))

# Copy the index dates and ethnicity to the repeated measures file.
transferVariables(mycohort, mydata, c('indexdate', 'ethnic_hes'))
print(mydata)

# Now use them to do a calculation on the repeated measures.
mydata[, temp:= ifelse(ethnic_hes == 'White', data1, 2)]

# Select a summary measure using addToCohort
addToCohort(mycohort, 'newvar', data = mydata,
    old_varname = 'temp', value_choice = c(2, 1))
print(mycohort)