import_CPRD_data: Imports all selected CPRD data into an sqlite database

Description Usage Arguments Details

View source: R/cprd_import.R

Description

This function can import from both cohorts downloaded via the CPRD online tool and CPRD GOLD builds

Usage

1
2
3
4
import_CPRD_data(db, data_dir, filetypes = c("Additional", "Clinical",
  "Consultation", "Immunisation", "Patient", "Practice", "Referral", "Staff",
  "Test", "Therapy"), dateformat = "%d/%m/%Y", yob_origin = 1800,
  regex = "PET", recursive = TRUE, ...)

Arguments

db

a database connection

data_dir

the directory containing the CPRD cohort data

filetypes

character vector of filetypes to be imported

dateformat

the format that dates are stored in the CPRD data. If this is wrong it won't break but all dates are likely to be NA

yob_origin

value to add yob values to to get actual year of birth (Generally 1800)

regex

character regular expression to identify data files in the directory. This is separated from the filetype by an underscore. e.g. 'p[0-9]3' in CPRD GOLD

recursive

logical should files be searched for recursively under the data_dir?

...

arguments to be passed to add_to_database

Details

Note that if you chose to import all the filetype, you may end up with a very large database file. You may then chose only to import the files you want to use. You can always import the rest of the files later. This function may take a long time to process because it unzips (potentially large) files, reads into R where it converts the date formats before importing to SQLite. However, this initial data preparation step will greatly accelerate downstream processing.


rOpenHealth/rEHR documentation built on May 26, 2019, 8:51 p.m.