Applying cohort table requirements"
In CohortConstructor: Build and Manipulate Study Cohorts Using a Common Data Model

knitr::opts_chunk$set(
  collapse = TRUE,
  eval = TRUE, 
  message = FALSE, 
  warning = FALSE,
  comment = "#>"
)

In this vignette we'll show how requirements related to the data contained in the cohort table can be applied. For this we'll use the Eunomia synthetic data.

library(CodelistGenerator)
library(CohortConstructor)
library(CohortCharacteristics)
library(ggplot2)
library(dplyr)

if (Sys.getenv("EUNOMIA_DATA_FOLDER") == ""){
  Sys.setenv("EUNOMIA_DATA_FOLDER" = file.path(tempdir(), "eunomia"))}
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))){ dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
  CDMConnector::downloadEunomiaData()  
}

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = CDMConnector::eunomiaDir())
cdm <- CDMConnector::cdmFromCon(con, cdmSchema = "main", 
                    writeSchema = "main", writePrefix = "my_study_")

Let's start by creating a cohort of acetaminophen users. Individuals will have a cohort entry for each drug exposure record they have for acetaminophen with cohort exit based on their drug record end date. Note when creating the cohort, any overlapping records will be concatenated.

acetaminophen_codes <- getDrugIngredientCodes(cdm, 
                                              name = "acetaminophen", 
                                              nameStyle = "{concept_name}")
cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                   conceptSet = acetaminophen_codes, 
                                   exit = "event_end_date",
                                   name = "acetaminophen")

At this point we have just created our base cohort without having applied any restrictions. To visualise the current state of the cohort, we can use the summariseCohortAttrition() function to summarise attrition and then plot the results using plotCohortAttrition().

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)

Keep only the first record per person

We can see that in our starting cohort individuals have multiple entries for each use of acetaminophen. However, we could keep only their earliest cohort entry by using requireIsFirstEntry() from CohortConstructor.

cdm$acetaminophen <- cdm$acetaminophen |> 
  requireIsFirstEntry()

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)

While the number of individuals remains unchanged, records after an individual's first have been excluded.

Keep only the last record per person

If we want to keep the latest record per person instead of the earliest we would use requireIsLastEntry(). This will ensure that only the latest record for acetaminophen use remains in the cohort.

cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                   conceptSet = acetaminophen_codes, 
                                   exit = "event_end_date",
                                   name = "acetaminophen")
cdm$acetaminophen <- cdm$acetaminophen |> 
  requireIsLastEntry()

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)

Keep only a range of records per person

If we want to keep only a specific range of records per person, we can use the requireIsEntry() function. For example, o keep only the first two entries for each person, we can set entryRange = c(1, 2).

cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                   conceptSet = acetaminophen_codes, 
                                   exit = "event_end_date",
                                   name = "acetaminophen")
cdm$acetaminophen <- cdm$acetaminophen |> 
  requireIsEntry(entryRange = c(1,2))

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)

Keep only records within a date range

Individuals may contribute multiple records over extended periods. We can filter out records that fall outside a specified date range using the requireInDateRange function.

cdm$acetaminophen <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen")

cdm$acetaminophen <- cdm$acetaminophen |> 
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2015-01-01")))

summary_attrition <- summariseCohortAttrition(cdm$acetaminophen)
plotCohortAttrition(summary_attrition)

Applying multiple cohort requirements

Multiple restrictions can be applied to a cohort, however it is important to note that the order that requirements are applied will often matter.

cdm$acetaminophen_1 <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen_1") |> 
  requireIsFirstEntry() |>
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01")))

cdm$acetaminophen_2 <- conceptCohort(cdm = cdm, 
                                 conceptSet = acetaminophen_codes, 
                                 name = "acetaminophen_2") |>
  requireInDateRange(dateRange = as.Date(c("2010-01-01", "2016-01-01"))) |> 
  requireIsFirstEntry()

summary_attrition_1 <- summariseCohortAttrition(cdm$acetaminophen_1)
summary_attrition_2 <- summariseCohortAttrition(cdm$acetaminophen_2)

Here we see attrition if we apply our entry requirement before our date requirement. In this case we have a cohort of people with their first ever record of acetaminophen which occurs in our study period.

plotCohortAttrition(summary_attrition_1)

And here we see attrition if we apply our date requirement before our entry requirement. In this case we have a cohort of people with their first record of acetaminophen in the study period, although this will not necessarily be their first record ever.

plotCohortAttrition(summary_attrition_2)

Keep only records from cohorts with a minimum number of individuals

Another useful functionality, particularly when working with multiple cohorts or performing a network study, is provided by requireMinCohortCount. Here we will only keep cohorts with a minimum count, filtering out records from cohorts with fewer than this number.

As an example let's create a cohort for every drug ingredient we see in Eunomia. We can first get the drug ingredient codes.

medication_codes <- getDrugIngredientCodes(cdm = cdm, nameStyle = "{concept_name}")
medication_codes

We can see that when we make all these cohorts many have only a small number of individuals.

cdm$medications <- conceptCohort(cdm = cdm, 
                                 conceptSet = medication_codes,
                                 name = "medications")


cohortCount(cdm$medications) |> 
  filter(number_subjects > 0) |> 
  ggplot() +
  geom_histogram(aes(number_subjects),
                 colour = "black",
                 binwidth = 25) +  
  xlab("Number of subjects") +
  theme_bw()

If we apply a minimum cohort count of 500, we end up with far fewer cohorts that all have a sufficient number of study participants.

cdm$medications <- cdm$medications |> 
  requireMinCohortCount(minCohortCount = 500)

cohortCount(cdm$medications) |> 
  filter(number_subjects > 0) |> 
  ggplot() +
  geom_histogram(aes(number_subjects),
                 colour = "black",
                 binwidth = 25) + 
  xlim(0, NA) + 
  xlab("Number of subjects") +
  theme_bw()