CCHIC data strongly impacts the design of cleanEHR. Complex, heterogeneous, and high resolution longitudinal data is a trend of EHR analysis, which takes advantage of the ever more sophisticated statistical techniques and growing computation capability. CCHIC is a representative example database of such kind. It records 263 fields including 154 time-varying fields of patients during their stay in intensive care units across five NHS trusts in England. The recorded variables include patient demographics, time of admission and discharge, survival status, diagnosis, physiology, laboratory, nursing and drug. The latest database contains 22,628 admissions from 2014 to 2016, with about 119 million data points (~6k per patient).
We introduced the concept of 'episode' as the fundamental entity of EHRs, which comprises all the data being recorded during the ICU stay. Each episode also contains the demographic information of the patient, ward transferring origin and destination within a hospital and diagnosis information. It allows us to link episode data from a single patient across the entire multi-centre database. The key dates and times are recorded as follow.
library(cleanEHR) data("sample_ccd") # Extract all non-longitudinal data (demographic, time, survival status, diagnosis) dt <- ccd_demographic_table(ccd, dtype=TRUE)
ccd_demographic_table function returns a table of all non-time-varying fields alongside with
several derived fields -- the fields that are not directly recorded in the original data.
Each row of the table is a unique admission, and every column is a non-time-varying data field.
pid: unique patient ID derived from NHS number or PAS number.
AGE: date of birth - unit admission time
spell: see Spell
Data missing are caused by many reasons. We have to understand that in such a large database, the data quality varies. In the anonymised dataset, data can be missing due to security reason. Missing data are either NULL or "NA" depending on the field data type.
Every patient in England has a unique NHS number and PAS (Patient Admission System) number. These can be used to identify a unique patient. Other demographic information can also be found in the CCHIC dataset such as age, sex, GP code, postcode and so on. Most of the demographic data will be removed, pseudonimised, or modified in the anonymised dataset to protect the patient privacy.
Ward transferring within or beyond ICUs are counted as different episodes respectively. In some research, one may be interested in looking at the sickness development and the treatment history beyond each ICU stays. A spell includes number of episodes from a unique patient which occured in a user defined period One can link episodes by spell ID.
head(ccd_unique_spell(ccd, duration=1)[, c("episode_id", "spell")])
Instead of using free text, we adopted ICNARC diagnosis
code system to record the diagnosis. The full ICNARC code is a five digit number separated
with dots. From left to right each digit represents a higher category of diagnosis. Due
to the privacy concerns, in the anonysmised dataset, last two digits will be removed.
One may use the function
icnarc2diagnosis to look up the diagnosis code.
icnarc2diagnosis("1.1") icnarc2diagnosis("1.1.4") icnarc2diagnosis("126.96.36.199.1")
CCHIC measures 154 longitudinal data. The full list of longitudinal data are shown below.
We can also easily plot the data from a single selected admission.
plot_episode(ccd@episodes[], c("h_rate", "bilirubin", "fluid_balance_d"))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.