knitr::opts_chunk$set( collapse = TRUE, eval = TRUE, message = FALSE, warning = FALSE, comment = "#>" )
In this vignette we'll show how requirements related to the data contained in the cohort table can be applied. For this we'll use the Eunomia synthetic data.
library(CodelistGenerator) library(CohortConstructor) library(CohortCharacteristics) library(ggplot2) library(dplyr)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == ""){ Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia"))} if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))){ dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER")) CDMConnector::downloadEunomiaData() }
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomiaDir()) cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", writeSchema = "main", writePrefix = "my_study_")
Let's start by creating a cohort of acetaminophen users. Individuals will have a cohort entry for each drug exposure record they have for acetaminophen with cohort exit based on their drug record end date. Note when creating the cohort, any overlapping records will be concatenated.
acetaminophen_codes <- getDrugIngredientCodes(cdm, name = "acetaminophen", nameStyle = "{concept_name}") cdm$acetaminophen <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, exit = "event_end_date", name = "acetaminophen")
At this point we have just created our base cohort without having applied any restrictions. To visualise the current state of the cohort, we can use the summariseCohortAttrition()
function to summarise attrition and then plot the results using plotCohortAttrition()
.
summary_attrition <- summariseCohortAttrition(cdm$acetaminophen) plotCohortAttrition(summary_attrition)
We can see that in our starting cohort individuals have multiple entries for each use of acetaminophen. However, we could keep only their earliest cohort entry by using requireIsFirstEntry()
from CohortConstructor.
cdm$acetaminophen <- cdm$acetaminophen |> requireIsFirstEntry() summary_attrition <- summariseCohortAttrition(cdm$acetaminophen) plotCohortAttrition(summary_attrition)
While the number of individuals remains unchanged, records after an individual's first have been excluded.
If we want to keep the latest record per person instead of the earliest we would use requireIsLastEntry()
. This will ensure that only the latest record for acetaminophen use remains in the cohort.
cdm$acetaminophen <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, exit = "event_end_date", name = "acetaminophen") cdm$acetaminophen <- cdm$acetaminophen |> requireIsLastEntry() summary_attrition <- summariseCohortAttrition(cdm$acetaminophen) plotCohortAttrition(summary_attrition)
If we want to keep only a specific range of records per person, we can use the requireIsEntry()
function. For example, o keep only the first two entries for each person, we can set entryRange = c(1, 2)
.
cdm$acetaminophen <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, exit = "event_end_date", name = "acetaminophen") cdm$acetaminophen <- cdm$acetaminophen |> requireIsEntry(entryRange = c(1,2)) summary_attrition <- summariseCohortAttrition(cdm$acetaminophen) plotCohortAttrition(summary_attrition)
Individuals may contribute multiple records over extended periods. We can filter out records that fall outside a specified date range using the requireInDateRange
function.
cdm$acetaminophen <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, name = "acetaminophen")
cdm$acetaminophen <- cdm$acetaminophen |> requireInDateRange(dateRange = as.Date(c("2010-01-01", "2015-01-01"))) summary_attrition <- summariseCohortAttrition(cdm$acetaminophen) plotCohortAttrition(summary_attrition)
Multiple restrictions can be applied to a cohort, however it is important to note that the order that requirements are applied will often matter.
cdm$acetaminophen_1 <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, name = "acetaminophen_1") |> requireIsFirstEntry() |> requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01"))) cdm$acetaminophen_2 <- conceptCohort(cdm = cdm, conceptSet = acetaminophen_codes, name = "acetaminophen_2") |> requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01"))) |> requireIsFirstEntry()
summary_attrition_1 <- summariseCohortAttrition(cdm$acetaminophen_1) summary_attrition_2 <- summariseCohortAttrition(cdm$acetaminophen_2)
Here we see attrition if we apply our entry requirement before our date requirement. In this case we have a cohort of people with their first ever record of acetaminophen which occurs in our study period.
plotCohortAttrition(summary_attrition_1)
And here we see attrition if we apply our date requirement before our entry requirement. In this case we have a cohort of people with their first record of acetaminophen in the study period, although this will not necessarily be their first record ever.
plotCohortAttrition(summary_attrition_2)
Another useful functionality, particularly when working with multiple cohorts or performing a network study, is provided by requireMinCohortCount
. Here we will only keep cohorts with a minimum count, filtering out records from cohorts with fewer than this number.
As an example let's create a cohort for every drug ingredient we see in Eunomia. We can first get the drug ingredient codes.
medication_codes <- getDrugIngredientCodes(cdm = cdm, nameStyle = "{concept_name}") medication_codes
We can see that when we make all these cohorts many have only a small number of individuals.
cdm$medications <- conceptCohort(cdm = cdm, conceptSet = medication_codes, name = "medications") cohortCount(cdm$medications) |> filter(number_subjects > 0) |> ggplot() + geom_histogram(aes(number_subjects), colour = "black", binwidth = 25) + xlab("Number of subjects") + theme_bw()
If we apply a minimum cohort count of 500, we end up with far fewer cohorts that all have a sufficient number of study participants.
cdm$medications <- cdm$medications |> requireMinCohortCount(minCohortCount = 500) cohortCount(cdm$medications) |> filter(number_subjects > 0) |> ggplot() + geom_histogram(aes(number_subjects), colour = "black", binwidth = 25) + xlim(0, NA) + xlab("Number of subjects") + theme_bw()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.