Basic Workflow"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Objectives

If you want to access script templates for these steps, see

Reminder of Database Structure

Each table has a unique identifying key and other keys to perform joins.

| Table | Key | Other keys | |--------|---------------|------------------| | demo | UMCReportId | | | drug | Drug_Id | UMCReportId | | adr | Adr_Id | UMCReportId |

Goal of the tutorial: perform a disproportionality analysis between colitis and nivolumab among checkpoint inhibitor cases.

Step 0: Load Packages

library(vigicaen)
library(rlang)
library(dplyr)

Build tables from source files {#building_tables}

This process should be done once per database version.

You don't have to do it to follow this tutorial, since we will use the package built-in example tables.

However, when working on real analyses, you will need to process this step first.

See the vignette here vignette("getting_started").

Collecting IDs {#collecting_ids}

The whole package relies on defining a dictionary of drugs and adrs of interest.

Those collection of terms should be stored in named lists.

# drug selection
d_sel <-
  list(
    nivolumab   = "nivolumab",
    ipilimumab  = "ipilimumab",
    nivo_or_ipi = c("nivolumab", "ipilimumab")
  )

# adverse drug reaction selection

a_sel <- 
  list(
    colitis     = "Colitis",
    pneumonitis = "Pneumonitis"
  )

As we see, the d_sel list contains three named vectors: nivolumab, ipilimumab, and nivo_or_ipi. Each of these vectors can contain one or more names of drugs.

The names of the vectors don't have to be the same as the drug names. They will be used to created columns later in the process.

We will pass these named list selections to ID collector functions: The get_* family.

These functions will allow to collect IDs (e.g. codes) matching our drugs and adrs in a specific dictionary.

Data management

Drugs {#drug_workflow}

Principle

  1. Load the demo, drug, and mp tables.
  2. Select one or more medications of interest.
  3. Identify the drug codes (e.g., the DrecNo(s)) associated with these drug using the mp table.
  4. Search for cases exposed to these medications using the codes in the drug table.
  5. Update the demo table: code 1 if the case reports the medication of interest, 0 otherwise.
  6. Check your data management

Step 1 is performed with dt_parquet() if you have your own tables, or using the built-in example tables for this tutorial.

Step 2 and 3 can be referred as "dictionary" steps.

Step 3 is performed with get_drecno() or get_atc_code().

Steps 4 and 5 are performed with add_drug().

step 6 is performed with check_dm().

Step 1: Load the tables

demo <- demo_
drug <- drug_

mp <- mp_

Note: if you are working with your own tables, you will need to load them here, with dt_parquet().

Step 2: Choose drugs of Interest

This is probably the most interesting part of the process, from a scientific point of view. But for now, it's pretty trivial.

Once you've decided which drugs you would like to study (e.g. nivolumab in this tutorial), you need to create a named list of character vectors.

d_sel <- # drug selection
  list2(
    nivolumab = "nivolumab"
  )

d_sel

Remember to use lower case names in d_sel (no capital letters: Good: "nivolumab", Wrong: "Nivolumab" or "NIVOLUMAB")

The ideal way of picking drugs is by using their WHO name.

WHO name is an international non-proprietary name (INN) for the drug. A few drugs have more than one INN (e.g. paracetamol and acetaminophen), but they still have a unique WHO name (most of the time, one of the INN).

The ID collector get_drecno() will let you know if you missed the WHO name.

Alternatively, you can work with Anatomical and Therapeutical Classification (ATC) codes to investigate a set of drug. This is explained in the ATC section.

Step 3: Identify drug codes

The function get_drecno() allows you to query the mp table with a selection of drug names.

It takes several arguments, two of which must be filled in: the selection of drugs and the table containing the correspondence between the name and the code (here, mp).

You should always look carefully at the printed message.

  1. Use get_drecno() with argument verbose = TRUE (default).
get_drecno(
  d_sel = d_sel,
  mp = mp_,
  verbose = TRUE
  )

We see that get_drecno() finds two entries containing the drug "nivolumab" in the mp table:

This second entry was identified because the allow_combination argument is set to TRUE by default. This allows for a broader identification of all specialties containing nivolumab. In this situation, this behavior is desirable because we want to be sure to identify all cases reporting the drug.

  1. Since these are actually the codes we were looking for, we can (optionally) set the verbose argument to FALSE and keep the result in an R object called d_drecno.
d_drecno <-
  get_drecno(
    d_sel = d_sel,
    mp = mp_,
    verbose = FALSE
    )

Steps 4 and 5: add_drug() function

To identify cases reporting the drug of interest and add the corresponding column to demo, we use the add_drug() function.

The add_drug() function takes 3 mandatory arguments:

demo <- 
  add_drug(
    .data = demo,
    d_code = d_drecno,
    drug_data = drug)
demo

Or, in tidyverse syntax

demo <- 
  demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug
  )

Step 6: Check your data management

This may seem trivial, but it is an essential step in the construction of a dataset.

There are many ways to check that the code has worked. Here, the check_dm() function will count the number of rows in the dataset where the desired column is equal to 1.

check_dm(demo, "nivolumab")

It shows how many rows in demo have the value 1 in the nivolumab column (e.g. how many cases where identified as reporting on nivolumab reactions).

Here, we see that 225 cases report nivolumab.

Step 2 and 3 variant: ATC classes {#atc_classes}

The correspondence between ATC (Anatomical and Therapeutical Classification) classes and drug codes is found in the thg table. In this table, drug codes are stored as MedicinalProd_Id. It is therefore necessary to make a second correspondence with mp to find DrecNo.

This can be done with the get_atc_code() function.

As with drugs, we first need to identify the ATC class of interest (here, "L03").

atc_sel <-
  list2(l03 = "L03")

atc_drecno <- 
  get_atc_code(
    atc_sel = atc_sel,
    mp = mp,
    thg_data = thg_
    )

The get_atc_code() function requires the mp and thg tables, as well as the selection of ATC classes.

str(atc_drecno)

By default, this function retrieves DrecNos associated with an ATC class. It is possible to retrieve MedicinalProd_Ids instead by setting the vigilyze argument to FALSE.

The interest of using MedicinalProd_Ids instead of DrecNos is to restrict the drug panel only to packages corresponding to a specific ATC class (e.g., you might not want to find all packages of corticosteroids if you work with the ATC class "S01BA", which corresponds to ophtalmic steroids).

Once DrecNos are identified, we can add them to the demo table, with the add_drug() function.

demo |> 
  add_drug(
    d_code = atc_drecno,
    drug_data = drug
  )

Step 4 and 5 variant: Suspect, concomitant, interacting

We can choose to work with drugs according to their "reputation basis".

This information is stored in the Basis column of the drug table.

By using the add_drug() function, we can specify which type of status we are interested in, in the repbasis argument. By default, the value "sci" indicates that we consider the drug whether it is suspect, concomitant, or interacting. We can change the selection.

demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug,
    repbasis = "sci"
  ) |> 
  check_dm("nivolumab")

# suspected only

demo |> 
  add_drug(
    d_code = d_drecno,
    drug_data = drug,
    repbasis = "s"
  ) |> 
  check_dm("nivolumab")

Create multiple drug columns

To work with multiple drugs, you need to update the initial d_sel list.

d_sel <- 
  list2(
    nivolumab = "nivolumab",
    pembrolizumab = "pembrolizumab"
  )

d_drecno <-
  d_sel |> 
  get_drecno(mp = mp)

demo <- 
  demo |> 
  add_drug(
    d_drecno,
    drug_data = drug
  )

demo |> 
  check_dm(c("nivolumab", "pembrolizumab"))

Drug groups

If you want to work at the level of a group of drugs, but the ATC classes do not match your needs perfectly, you can group them in the d_sel list.

d_sel <- 
  list2(
    analgesics = c("paracetamol", "tramadol"),
    ici = c("nivolumab", "pembrolizumab")
  )

d_drecno <-
  d_sel |> 
  get_drecno(mp = mp,
             allow_combination = FALSE)

demo <- 
  demo |> 
  add_drug(
    d_drecno,
    drug_data = drug
  )

demo |> 
  check_dm(names(d_sel))

Adverse drug reactions

Principles

  1. Load the demo, adr, and meddra tables.

  2. Choose the adverse event(s) of interest.

  3. Identify the event codes (these are low-level terms according to the MedDRA classification). They can be found in the meddra table or in the smq tables.

  4. Search for cases that have presented this event, using the codes

  5. Update the demo table: code 1 if the case reports the event of interest, 0 otherwise.

  6. Check your data management

Similarly to the drug workflow, steps 2 and 3 can be referred to as "dictionary" steps.

Step 3 uses get_llt_soc() or get_llt_smq().

Step 1: Load the tables

adr <- adr_
meddra <- meddra_

demo was loaded during the drug workflow.

Step 2: Choose events of interest

a_sel_pt <-
  list2(
    a_colitis = c(
      "Colitis",
      "Autoimmune colitis",
      "Colitis microscopic",
      "Diarrhoea",
      "Diarrhoea haemorrhagic",
      "Duodenitis",
      "Enteritis",
      "Enterocolitis",
      "Enterocolitis haemorrhagic",
      "Ulcerative gastritis"
    )
  )

We start with a list of adverse events of interest, grouped altogether under the name "a_colitis".

MedDRA terms always start with a capital letter, be sure to provide the exact case, e.g. Good : "Colitis", Wrong : "colitis" or "COLITIS".

Be sure all selected terms belong to the same hierarchical level (preferred term, high level term...) in MedDRA. Here, we use Preferred Terms.

Step 3: Identify event codes

The get_llt_soc() function allows you to query the meddra.

a_llt <- 
  get_llt_soc(
    term_sel = a_sel_pt,
    term_level = "pt",
    meddra = meddra_
    )

a_llt

An alternative is to use the get_llt_smq() function, which allows you to query the smq tables.

Notice that you collect low level term codes, even if you work with higher level terms, like preferred terms, or high level terms. This is intentional: this list collects all low level term codes composing the higher level term. See Collecting ID section.

Steps 4 and 5: add_adr() function

The add_adr() function allows you to identify cases reporting the adverse event of interest and add the corresponding column to demo.

demo <- 
  add_adr(
    .data = demo,
    a_code = a_llt,
    adr_data = adr)

Step 6: Check your data management

check_dm() also works for adr.

demo |> 
  check_dm("a_colitis")

Other variables

We may need to create other variables to perform our analysis, for example age and sex in a multivariable analysis.

Age

The demo table contains the AgeGroup column, which groups ages into categories. You may want to recode it to match you research question

demo <-
  demo |>
  mutate(
    age = cut(as.integer(AgeGroup),
              breaks = c(0,4,5,6,7,8),
              include.lowest = TRUE, right = TRUE,
              labels = c("<18", "18-45","45-64", "65-74", "75+"))
  )

Sex

The demo table contains the Gender column, from which you can also create a new sex column (with values 1 for men, 2 for women, and NA otherwise)

demo <-
  demo |> 
  mutate(
    sex = ifelse(Gender == "1", 1,
                 ifelse(Gender == "2", 2, NA_real_)
                 )
    )

Using case_when()

The case_when() function from the dplyr package allows you to manage multiple options in a single function, with a slightly different syntax.

demo <- 
  demo |> 
  mutate(
    sex = case_when(Gender == "1" ~ 1,
                    Gender == "2" ~ 2,
                    TRUE ~ NA_real_)
    )

More documentation on case_when() can be found in the dplyr package documentation.

You should just remember here that options are evaluated sequentially, from top to bottom.

Seriousness, death

The out table contains the Seriousness column, which indicates whether the case was serious or not, and whether the patient experienced a fatal issue during his/her follow-up.

# ---- Serious ---- ####

out <- out_

demo <- 
  demo |> 
  mutate(
    serious = 
      ifelse(
        UMCReportId %in% out$UMCReportId,
        UMCReportId %in% 
          (out |> 
          filter(Serious == "Y") |> 
          pull(UMCReportId)
          ),
        NA)
    )

# ---- Death + outcome availability ---- ####

demo <- 
  demo |> 
  mutate(death = 
           ifelse(UMCReportId %in% out$UMCReportId,
                  UMCReportId %in% 
                    (out |> 
                    filter(Seriousness == "1") |> 
                    pull(UMCReportId)
                    ),
                  NA)
         )

Disproportionality

Our demo dataset now has a drug column for nivolumab, and an adr column for colitis.

We can perform a disproportionality analysis between these two variables.

Univariate analysis

Disproportionality metrics

Reporting Odds-Ratio (ROR) and Information Component essentially measure the same thing: the disproportionality.

compute_dispro() computes both of these.

demo |> 
  compute_dispro(
    y = "a_colitis",
    x = "nivolumab"
    )

Advanced modelling, multivariate analysis

From this point, it is also possible to run any statistical model including drug and adr parameters, but also potential other variables such as age and sex. For example, one could wish to perform a multivariate logistic regression on the reporting of colitis and nivolumab, adjusted on age groups and sex.

The glm() function from the stats package can be used for this purpose.

mod <- glm(a_colitis ~ nivolumab, 
           data = demo, family = "binomial")

summary(mod)

In a logistic regression models, estimates lead to (reporting) OR by the exponential.

summary(mod)$coefficients

exp(summary(mod)$coefficients[2, 1])

Adding covariates is straightforward

mod2 <- glm(a_colitis ~ nivolumab + sex + age,
            data = demo,
            family = "binomial")

summary(mod2)

Extract Odds-Ratio with compute_or_mod()

There are several packages that can extract the OR from a model. The compute_or_mod() function is just one of many ways to do it.

mod_or <- 
  compute_or_mod(
    summary(mod2)$coefficients,
    estimate = Estimate,
    std_er = Std..Error
    )

mod_or

Conclusion

You're now all set to create drugs and adrs columns into a demo dataset. This is the first step to many modelling possibilities!

Where do you want to go next?



Try the vigicaen package in your browser

Any scripts or data that you put into this service are public.

vigicaen documentation built on April 3, 2025, 8:55 p.m.