knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Collect drug and adverse drug reaction IDs
Perform standard data management in VigiBase ECL
Conduct disproportionality analysis (both univariate and multivariate)
If you want to access script templates for these steps, see
vignette("template_dictionary")
and
vignette("template_main")
Each table has a unique identifying key and other keys to perform joins.
| Table | Key | Other keys |
|--------|---------------|------------------|
| demo
| UMCReportId
| |
| drug
| Drug_Id
| UMCReportId
|
| adr
| Adr_Id
| UMCReportId
|
Goal of the tutorial: perform a disproportionality analysis between colitis and nivolumab among checkpoint inhibitor cases.
library(vigicaen) library(rlang) library(dplyr)
This process should be done once per database version.
You don't have to do it to follow this tutorial, since we will use the package built-in example tables.
However, when working on real analyses, you will need to process this step first.
See the vignette here vignette("getting_started")
.
The whole package relies on defining a dictionary of drugs and adrs of interest.
Those collection of terms should be stored in named lists.
# drug selection d_sel <- list( nivolumab = "nivolumab", ipilimumab = "ipilimumab", nivo_or_ipi = c("nivolumab", "ipilimumab") ) # adverse drug reaction selection a_sel <- list( colitis = "Colitis", pneumonitis = "Pneumonitis" )
As we see, the d_sel
list contains three named vectors: nivolumab, ipilimumab, and nivo_or_ipi. Each of these vectors can contain one or
more names of drugs.
The names of the vectors don't have to be the same as the drug names. They will be used to created columns later in the process.
We will pass these named list selections to ID collector functions:
The get_*
family.
get_drecno()
for drugs, get_atc_code()
for ATC classes
get_llt_soc()
or get_llt_smq()
for adverse drug reactions
These functions will allow to collect IDs (e.g. codes) matching our drugs and adrs in a specific dictionary.
For drugs, we will use the WHODrug dictionary, and collect Drug Record Numbers (DrecNos) most of the time, or Medicinal Product Ids (MedicinalProd_Ids) in some specific scenarii.
For adrs, we will use the Medical Dictionary for Regulatory Activities (MedDRA), and collect term codes (low-level term codes). Here, it is important to note that we can work with other terms (like Preferred Terms, High Level terms, etc.). The ID collector will just collect low-level term codes of all higher level terms, resulting in a pretty long list of codes, because VigiBase data is structured on low-level term codes.
demo
, drug
, and mp
tables.DrecNo(s)
)
associated with these drug using the mp
table.drug
table.demo
table: code 1 if the case reports the medication of interest, 0 otherwise.Step 1 is performed with dt_parquet()
if you have your own tables, or using the built-in example tables for this tutorial.
Step 2 and 3 can be referred as "dictionary" steps.
Step 3 is performed with get_drecno()
or get_atc_code()
.
Steps 4 and 5 are performed with add_drug()
.
step 6 is performed with check_dm()
.
demo <- demo_ drug <- drug_ mp <- mp_
Note: if you are working with your own tables, you will need to load them here, with
dt_parquet()
.
This is probably the most interesting part of the process, from a scientific point of view. But for now, it's pretty trivial.
Once you've decided which drugs you would like to study (e.g. nivolumab in this tutorial), you need to create a named list of character vectors.
d_sel <- # drug selection list2( nivolumab = "nivolumab" ) d_sel
Remember to use lower case names in d_sel (no capital letters: Good: "nivolumab", Wrong: "Nivolumab" or "NIVOLUMAB")
The ideal way of picking drugs is by using their WHO name.
WHO name is an international non-proprietary name (INN) for the drug. A few drugs have more than one INN (e.g. paracetamol and acetaminophen), but they still have a unique WHO name (most of the time, one of the INN).
The ID collector get_drecno()
will let you know if you missed the WHO name.
Alternatively, you can work with Anatomical and Therapeutical Classification (ATC) codes to investigate a set of drug. This is explained in the ATC section.
The function get_drecno()
allows you to query the mp
table with a selection of drug names.
It takes several arguments, two of which must be filled in:
the selection of drugs and the table containing the
correspondence between the name and the code (here, mp
).
You should always look carefully at the printed message.
get_drecno()
with argument verbose = TRUE
(default).get_drecno( d_sel = d_sel, mp = mp_, verbose = TRUE )
We see that get_drecno()
finds two entries containing the drug "nivolumab" in the mp
table:
The entry for nivolumab alone
The entry for the ipilimumab;nivolumab combination
This second entry was identified because the allow_combination
argument is set to TRUE
by default.
This allows for a broader identification of all specialties
containing nivolumab. In this situation, this behavior is
desirable because we want to be sure to identify all cases
reporting the drug.
verbose
argument to FALSE and keep the result
in an R object called d_drecno
.d_drecno <- get_drecno( d_sel = d_sel, mp = mp_, verbose = FALSE )
To identify cases reporting the drug of interest and add the
corresponding column to demo
, we use the add_drug()
function.
The add_drug()
function takes 3 mandatory arguments:
The dataset on which to add the drug variable(s) (here, demo
)
A named list containing the codes of the drug(s)
The drug
table linking drug intake to each case.
demo <- add_drug( .data = demo, d_code = d_drecno, drug_data = drug) demo
Or, in tidyverse syntax
demo <- demo |> add_drug( d_code = d_drecno, drug_data = drug )
This may seem trivial, but it is an essential step in the construction of a dataset.
There are many ways to check that the code has worked.
Here, the check_dm()
function will count the number of rows
in the dataset where the desired column is equal to 1.
check_dm(demo, "nivolumab")
It shows how many rows in demo
have the value 1 in the
nivolumab
column (e.g. how many cases where identified
as reporting on nivolumab reactions).
Here, we see that 225 cases report nivolumab.
The correspondence between ATC (Anatomical and Therapeutical
Classification) classes and drug codes is found in the thg
table. In this table, drug codes are stored as
MedicinalProd_Id
. It is therefore necessary to make a second
correspondence with mp
to find DrecNo
.
This can be done with the get_atc_code()
function.
As with drugs, we first need to identify the ATC class of interest (here, "L03").
atc_sel <- list2(l03 = "L03") atc_drecno <- get_atc_code( atc_sel = atc_sel, mp = mp, thg_data = thg_ )
The get_atc_code()
function requires the mp
and thg
tables, as well as the selection of ATC classes.
str(atc_drecno)
By default, this function retrieves DrecNos associated with
an ATC class. It is possible to retrieve MedicinalProd_Ids
instead by setting the vigilyze
argument to FALSE
.
The interest of using MedicinalProd_Ids instead of DrecNos is to restrict the drug panel only to packages corresponding to a specific ATC class (e.g., you might not want to find all packages of corticosteroids if you work with the ATC class "S01BA", which corresponds to ophtalmic steroids).
Once DrecNos are identified, we can add them to the demo
table,
with the add_drug()
function.
demo |> add_drug( d_code = atc_drecno, drug_data = drug )
We can choose to work with drugs according to their "reputation basis".
This information is stored in the Basis
column of the
drug
table.
1 suspect
2 concomitant
3 interacting
By using the add_drug()
function, we can specify which
type of status we are interested in, in the repbasis
argument.
By default, the value "sci"
indicates that we consider the
drug whether it is suspect, concomitant, or interacting.
We can change the selection.
demo |> add_drug( d_code = d_drecno, drug_data = drug, repbasis = "sci" ) |> check_dm("nivolumab") # suspected only demo |> add_drug( d_code = d_drecno, drug_data = drug, repbasis = "s" ) |> check_dm("nivolumab")
To work with multiple drugs,
you need to update the initial d_sel
list.
d_sel <- list2( nivolumab = "nivolumab", pembrolizumab = "pembrolizumab" ) d_drecno <- d_sel |> get_drecno(mp = mp) demo <- demo |> add_drug( d_drecno, drug_data = drug ) demo |> check_dm(c("nivolumab", "pembrolizumab"))
If you want to work at the level of a group of drugs,
but the ATC classes do not match your needs perfectly,
you can group them in the d_sel
list.
d_sel <- list2( analgesics = c("paracetamol", "tramadol"), ici = c("nivolumab", "pembrolizumab") ) d_drecno <- d_sel |> get_drecno(mp = mp, allow_combination = FALSE) demo <- demo |> add_drug( d_drecno, drug_data = drug ) demo |> check_dm(names(d_sel))
Load the demo
, adr
, and meddra
tables.
Choose the adverse event(s) of interest.
Identify the event codes (these are low-level terms according
to the MedDRA classification). They can be found in the meddra
table or in the smq
tables.
Search for cases that have presented this event, using the codes
Update the demo
table: code 1 if the case reports the
event of interest, 0 otherwise.
Check your data management
Similarly to the drug workflow, steps 2 and 3 can be referred to as "dictionary" steps.
Step 3 uses get_llt_soc()
or get_llt_smq()
.
adr <- adr_ meddra <- meddra_
demo was loaded during the drug workflow.
a_sel_pt <- list2( a_colitis = c( "Colitis", "Autoimmune colitis", "Colitis microscopic", "Diarrhoea", "Diarrhoea haemorrhagic", "Duodenitis", "Enteritis", "Enterocolitis", "Enterocolitis haemorrhagic", "Ulcerative gastritis" ) )
We start with a list of adverse events of interest, grouped altogether under the name "a_colitis".
MedDRA terms always start with a capital letter, be sure to provide the exact case, e.g. Good : "Colitis", Wrong : "colitis" or "COLITIS".
Be sure all selected terms belong to the same hierarchical level (preferred term, high level term...) in MedDRA. Here, we use Preferred Terms.
The get_llt_soc()
function allows you to query the meddra
.
a_llt <- get_llt_soc( term_sel = a_sel_pt, term_level = "pt", meddra = meddra_ ) a_llt
An alternative is to use the get_llt_smq()
function, which
allows you to query the smq
tables.
Notice that you collect low level term codes, even if you work with higher level terms, like preferred terms, or high level terms. This is intentional: this list collects all low level term codes composing the higher level term. See Collecting ID section.
The add_adr()
function allows you to identify cases reporting
the adverse event of interest and add the corresponding column
to demo
.
demo <- add_adr( .data = demo, a_code = a_llt, adr_data = adr)
check_dm()
also works for adr.
demo |> check_dm("a_colitis")
We may need to create other variables to perform our analysis, for example age and sex in a multivariable analysis.
The demo
table contains the AgeGroup
column, which groups
ages into categories. You may want to recode it to match
you research question
demo <- demo |> mutate( age = cut(as.integer(AgeGroup), breaks = c(0,4,5,6,7,8), include.lowest = TRUE, right = TRUE, labels = c("<18", "18-45","45-64", "65-74", "75+")) )
The demo
table contains the Gender
column, from which
you can also create a new sex column (with values 1 for
men, 2 for women, and NA otherwise)
demo <- demo |> mutate( sex = ifelse(Gender == "1", 1, ifelse(Gender == "2", 2, NA_real_) ) )
The case_when()
function from the dplyr
package allows
you to manage multiple options in a single function, with
a slightly different syntax.
demo <- demo |> mutate( sex = case_when(Gender == "1" ~ 1, Gender == "2" ~ 2, TRUE ~ NA_real_) )
More documentation on case_when()
can be found in the
dplyr
package documentation.
You should just remember here that options are evaluated sequentially, from top to bottom.
The out
table contains the Seriousness
column, which
indicates whether the case was serious or not, and whether
the patient experienced a fatal issue during his/her follow-up.
# ---- Serious ---- #### out <- out_ demo <- demo |> mutate( serious = ifelse( UMCReportId %in% out$UMCReportId, UMCReportId %in% (out |> filter(Serious == "Y") |> pull(UMCReportId) ), NA) ) # ---- Death + outcome availability ---- #### demo <- demo |> mutate(death = ifelse(UMCReportId %in% out$UMCReportId, UMCReportId %in% (out |> filter(Seriousness == "1") |> pull(UMCReportId) ), NA) )
The serious
and death
columns are coded with TRUE/FALSE values in
this example. There is no particular reason to prefer it over 1/0
codes. It is just a matter of preference.
The Seriousness
can have several levels, level 1 being death.
(see subsidiary files)
Our demo
dataset now has a drug column for nivolumab,
and an adr column for colitis.
We can perform a disproportionality analysis between these two variables.
Reporting Odds-Ratio (ROR) and Information Component essentially measure the same thing: the disproportionality.
compute_dispro()
computes both of these.
or
and or_ci
are the reporting Odds-Ratio and its confidence interval
(default: 95%CI).
ic
and ic_tail
are the Information Component,
and its lower end of credibility interval (default: IC025).
demo |> compute_dispro( y = "a_colitis", x = "nivolumab" )
From this point, it is also possible to run any statistical model including drug and adr parameters, but also potential other variables such as age and sex. For example, one could wish to perform a multivariate logistic regression on the reporting of colitis and nivolumab, adjusted on age groups and sex.
The glm()
function from the stats
package can be used for this
purpose.
mod <- glm(a_colitis ~ nivolumab, data = demo, family = "binomial") summary(mod)
In a logistic regression models, estimates lead to (reporting) OR by the exponential.
summary(mod)$coefficients exp(summary(mod)$coefficients[2, 1])
Adding covariates is straightforward
mod2 <- glm(a_colitis ~ nivolumab + sex + age, data = demo, family = "binomial") summary(mod2)
There are several packages that can extract the OR from a model.
The compute_or_mod()
function is just one of many ways to do it.
mod_or <- compute_or_mod( summary(mod2)$coefficients, estimate = Estimate, std_er = Std..Error ) mod_or
You're now all set to create drugs and adrs columns into a
demo
dataset. This is the first step to many modelling possibilities!
Where do you want to go next?
Dive into descriptive features, such as time to onset,
dechallenge, rechallenge, screening of drugs and adrs.
vignette("descriptive")
Learn on interactions in pharmacovigilance database
vignette("interactions")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.