knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This article describes creating an ADSL ADaM. Examples are currently presented and tested using DM, EX , AE, LB and DS SDTM domains. However, other domains could be used.

Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.

Programming Flow

Read in Data {#readdata}

To start, all data frames needed for the creation of ADSL should be read into the environment. This will be a company specific process. Some of the data frames needed may be DM, EX, DS, AE, and LB.

For example purpose, the CDISC Pilot SDTM datasets---which are included in {admiral.test}---are used.

library(admiral)
library(dplyr)
library(admiral.test)
library(lubridate)
library(stringr)

data("dm")
data("ds")
data("ex")
data("ae")
data("lb")

The DM domain is used as the basis for ADSL:

adsl <- dm %>% 
  select(-DOMAIN)
dataset_vignette(
  adsl, 
  display_vars = vars(USUBJID, RFSTDTC, COUNTRY, AGE, SEX, RACE, ETHNIC, ARM, ACTARM)
)

Derive Treatment Variables (TRT0xP, TRT0xA) {#treatmentvar}

The mapping of the treatment variables is left to the ADaM programmer. An example mapping may be:

adsl <- dm %>%
  mutate(TRT01P = ARM, TRT01A = ACTARM)

Derive/Impute Numeric Treatment Date/Time and Duration (TRTSDTM, TRTEDTM, TRTDURD) {#trtdatetime}

The functions derive_var_trtsdtm(), derive_var_trtedtm() can be used to derive the treatment start and end date/times using the ex domain.

Example calls:

adsl <- adsl %>%
  derive_var_trtsdtm(dataset_ex = ex) %>%
  derive_var_trtedtm(dataset_ex = ex)

This call returns the original data frame with the column TRTSDTM and TRTEDTM added. The datetime variables returned can be converted to dates using the derive_vars_dtm_to_dt() function.

adsl <- adsl %>%
  derive_vars_dtm_to_dt(source_vars = vars(TRTSDTM, TRTEDTM))

Now, that TRTSDT and TRTEDT are derived, the function derive_var_trtdurd() can be used to calculate the Treatment duration (TRTDURD).

adsl <- adsl %>%
  derive_var_trtdurd()
dataset_vignette(
  adsl,
  display_vars = vars(USUBJID, RFSTDTC, TRTSDTM, TRTSDT, TRTEDTM, TRTEDT, TRTDURD)
)

Derive Disposition Variables {#disposition}

Disposition Dates (e.g. EOSDT) {#disposition_date}

The function derive_var_disposition_dt() can be used to derive a disposition date. The relevant disposition date (DS.DSSTDTC) is selected by adjusting the filter parameter.

To derive the End of Study date (EOSDT), a call could be:

adsl <- adsl %>%
  derive_var_disposition_dt(
    dataset_ds = ds,
    new_var = EOSDT,
    dtc = DSSTDTC,
    filter = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE",
    date_imputation = NULL
  )

With DS:

dataset_vignette(
  ds, 
  display_vars = vars(USUBJID, DSCAT, DSDECOD,DSTERM, DSSTDTC),
  filter = DSDECOD != "SCREEN FAILURE"
)

We would get :

dataset_vignette(adsl, display_vars = vars(USUBJID, EOSDT))

This call would return the input dataset with the column EOSDT added. This function allows the user to impute partial dates as well. If imputation is needed and the date is to be imputed to the first of the month, then set date_imputation = "FIRST".

Disposition Status (e.g. EOSTT) {#disposition_status}

The function derive_var_disposition_status() can be used to derive a disposition status at a specific timepoint. The relevant disposition variable (DS.DSDECOD) is selected by adjusting the filter parameter and used to derive EOSSTT.

To derive the End of Study status (EOSSTT), a call could be:

adsl <- adsl %>%
  derive_var_disposition_status(
    dataset_ds = ds,
    new_var = EOSSTT,
    status_var = DSDECOD,
    filter = DSCAT == "DISPOSITION EVENT"
  )
dataset_vignette(adsl, display_vars = vars(USUBJID, EOSDT, EOSSTT))

Link to DS.

This call would return the input dataset with the column EOSSTT added.

By default, the function will derive EOSSTT as

If the default derivation must be changed, the user can create his/her own function and pass it to the format_new_var argument of the function (format_new_var = new_mapping) to map DSDECOD to a suitable EOSSTT value.

Example function format_eosstt():

format_eosstt <- function(x) {
  case_when(
    x %in% c("COMPLETED") ~ "COMPLETED",
    x %in% c("SCREEN FAILURE") ~ NA_character_,
    !is.na(x) ~ "DISCONTINUED",
    TRUE ~ "ONGOING"
  )
}

The customized mapping function format_eosstt() can now be passed to the main function:

adsl <- adsl %>%
  derive_var_disposition_status(
    dataset_ds = ds,
    new_var = EOSSTT,
    status_var = DSDECOD,
    format_new_var = format_eosstt,
    filter = DSCAT == "DISPOSITION EVENT"
  )

This call would return the input dataset with the column EOSSTT added.

Disposition Reason(s) (e.g. DCSREAS, DCSREASP) {#disposition_reason}

The main reason for discontinuation is usually stored in DSDECOD while DSTERM provides additional details regarding subject’s discontinuation (e.g., description of "OTHER").

The function derive_vars_disposition_reason() can be used to derive a disposition reason (along with the details, if required) at a specific timepoint. The relevant disposition variable(s) (DS.DSDECOD, DS.DSTERM) are selected by adjusting the filter parameter and used to derive the main reason (and details).

To derive the End of Study reason(s) (DCSREAS and DCSREASP), the call would be:

adsl <- adsl %>%
  derive_vars_disposition_reason(
    dataset_ds = ds,
    new_var = DCSREAS,
    reason_var = DSDECOD,
    new_var_spe = DCSREASP,
    reason_var_spe = DSTERM,
    filter = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )
dataset_vignette(adsl, display_vars = vars(USUBJID, EOSDT, EOSSTT, DCSREAS, DCSREASP))

Link to DS.

This call would return the input dataset with the column DCSREAS and DCSREASP added.

By default, the function will map

If the default derivation must be changed, the user can create his/her own function and pass it to the format_new_var argument of the function (format_new_var = new_mapping) to map DSDECOD and DSTERM to a suitable DCSREAS/DCSREASP value.

Example function format_dcsreas():

format_dcsreas <- function(dsdecod, dsterm = NULL) {
  out <- if (is.null(dsterm)) dsdecod else dsterm
  case_when(
    !dsdecod %in% c("COMPLETED", "SCREEN FAILURE") & !is.na(dsdecod) ~ out,
    TRUE ~ NA_character_
  )
}

The customized mapping function format_dcsreas() can now be passed to the main function:

adsl <- adsl %>%
  derive_var_disposition_reason(
    dataset_ds = ds,
    new_var = DCSREAS,
    reason_var = DSDECOD,
    new_var_spe = DCSREASP,
    reason_var_spe = DSTERM,
    format_new_vars = format_dcsreas,
    filter_ds = DSCAT == "DISPOSITION EVENT"
  )

Derive Death Variables {#death}

Death Date (DTHDT) {#death_date}

The function derive_vars_dt() can be used to derive DTHDT. This function allows the user to impute the date as well.

Example calls:

adsl <- adsl %>%
  derive_vars_dt(
    new_vars_prefix = "DTH",
    dtc = DTHDTC
  )
dataset_vignette(adsl, display_vars = vars(USUBJID, TRTEDT, DTHDTC, DTHDT, DTHFL))

This call would return the input dataset with the columns DTHDT added and, by default, the associated date imputation flag (DTHDTF) populated with the controlled terminology outlined in the ADaM IG for date imputations. If the imputation flag is not required, the user must set the argument flag_imputation to FALSE.

If imputation is needed and the date is to be imputed to the first day of the month/year the call would be:

adsl <- adsl %>%
  derive_vars_dt(
    new_vars_prefix = "DTH",
    dtc = DTHDTC,
    date_imputation = "FIRST"
  )

See also Date and Time Imputation.

Cause of Death (DTHCAUS) {#death_cause}

The cause of death DTHCAUS can be derived using the function derive_var_dthcaus().

Since the cause of death could be collected/mapped in different domains (e.g. DS, AE, DD), it is important the user specifies the right source(s) to derive the cause of death from.

For example, if the date of death is collected in the AE form when the AE is Fatal, the cause of death would be set to the preferred term (AEDECOD) of that Fatal AE, while if the date of death is collected in the DS form, the cause of death would be set to the disposition term (DSTERM). To achieve this, the dthcaus_source() objects must be specified and defined such as it fits the study requirement.

dthcause_source() specifications:

An example call to define the sources would be:

src_ae <- dthcaus_source(
  dataset_name = "ae",
  filter = AEOUT == "FATAL",
  date = AESTDTC,
  mode = "first",
  dthcaus = AEDECOD
)
dataset_vignette(
  ae,
  display_vars = vars(USUBJID, AESTDTC, AEENDTC, AEDECOD, AEOUT),
  filter =  AEOUT == "FATAL"
)
src_ds <- dthcaus_source(
  dataset_name = "ds",
  filter = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM),
  date = DSSTDTC,
  mode = "first",
  dthcaus = "Death in DS"
)
dataset_vignette(
  ds,
  display_vars = vars(USUBJID, DSDECOD, DSTERM, DSSTDTC),
  filter = DSDECOD == "DEATH"
)

Once the sources are defined, the function derive_var_dthcaus() can be used to derive DTHCAUS:

adsl <- adsl %>%
  derive_var_dthcaus(src_ae, src_ds, source_datasets = list(ae = ae, ds = ds))
dataset_vignette(
  adsl,
  display_vars = vars(USUBJID, EOSDT, DTHDTC, DTHDT, DTHCAUS),
  filter = DTHFL == "Y"
)

The function also offers the option to add some traceability variables (e.g. DTHDOM would store the domain where the date of death is collected, and DTHSEQ would store the xxSEQ value of that domain). To add them, the traceability_vars argument must be added to the dthcaus_source() arguments:

src_ae <- dthcaus_source(
  dataset_name = "ae",
  filter = AEOUT == "FATAL",
  date = AESTDTC,
  mode = "first",
  dthcaus = AEDECOD,
  traceability_vars = vars(DTHDOM = "AE", DTHSEQ = AESEQ)
)

src_ds <- dthcaus_source(
  dataset_name = "ds",
  filter = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM),
  date = DSSTDTC,
  mode = "first",
  dthcaus = DSTERM,
  traceability_vars = vars(DTHDOM = "DS", DTHSEQ = DSSEQ)
)
adsl <- adsl %>%
  select(-DTHCAUS) %>% # remove it before deriving it again
  derive_var_dthcaus(src_ae, src_ds, source_datasets = list(ae = ae, ds = ds))
dataset_vignette(
  adsl,
  display_vars = vars(USUBJID, TRTEDT, DTHDTC, DTHDT, DTHCAUS, DTHDOM, DTHSEQ),
  filter = DTHFL == "Y"
)

Duration Relative to Death {#death_other}

The function derive_vars_duration() can be used to derive duration relative to death like the Relative Day of Death (DTHADY) or the numbers of days from last dose to death (LDDTHELD).

Example calls:

adsl <- adsl %>%
  derive_vars_duration(
    new_var = DTHADY,
    start_date = TRTSDT,
    end_date = DTHDT
  )
adsl <- adsl %>%
  derive_vars_duration(
    new_var = LDDTHELD,
    start_date = TRTEDT,
    end_date = DTHDT,
    add_one = FALSE
  )
dataset_vignette(
  adsl,
  display_vars = vars(USUBJID, TRTEDT, DTHDTC, DTHDT, DTHCAUS, DTHADY, LDDTHELD),
  filter = DTHFL == "Y"
)

Derive the Last Date Known Alive (LSTALVDT) {#lstalvdt}

Similarly as for the cause of death (DTHCAUS), the last known alive date (LSTALVDT) can be derived from multiples sources and the user must ensure the sources (lstalvdt_source) are correctly defined.

lstalvdt_source() specifications:

An example could be :

ae_src1 <- lstalvdt_source(
  dataset_name = "ae",
  date = AESTDTC,
  date_imputation = "FIRST"
)
ae_src2 <- lstalvdt_source(
  dataset_name = "ae",
  date = AEENDTC,
  date_imputation = "LAST"
)
lb_src <- lstalvdt_source(
  dataset_name = "lb",
  date = LBDTC,
  filter = str_length(LBDTC) >= 10
)
adsl_src <- lstalvdt_source(
  dataset_name = "adsl",
  date = TRTEDT
)

Once the sources are defined, the function derive_var_lstalvdt() can be used to derive LSTALVDT:

adsl <- adsl %>%
  derive_var_lstalvdt(
    ae_src1, ae_src2, lb_src, adsl_src,
    source_datasets = list(ae = ae, adsl = adsl, lb = lb)
  )
dataset_vignette(
  adsl,
  display_vars = vars(USUBJID, TRTEDT, DTHDTC, LSTALVDT), 
  filter = !is.na(TRTSDT)
)

Similarly to dthcaus_source(), the traceability variables can be added by specifying the traceability_vars argument in lstalvdt_source().

ae_src1 <- lstalvdt_source(
  dataset_name = "ae",
  date = AESTDTC,
  date_imputation = "FIRST",
  traceability_vars = vars(LALVDOM = "AE", LALVSEQ = AESEQ, LALVVAR = "AESTDTC")
)
ae_src2 <- lstalvdt_source(
  dataset_name = "ae",
  date = AEENDTC,
  date_imputation = "LAST",
  traceability_vars = vars(LALVDOM = "AE", LALVSEQ = AESEQ, LALVVAR = "AEENDTC")
)
lb_src <- lstalvdt_source(
  dataset_name = "lb",
  date = LBDTC,
  filter = str_length(LBDTC) >= 10,
  traceability_vars = vars(LALVDOM = "LB", LALVSEQ = LBSEQ, LALVVAR = "LBDTC")
)
adsl_src <- lstalvdt_source(
  dataset_name = "adsl",
  date = TRTEDTM,
  traceability_vars = vars(LALVDOM = "ADSL", LALVSEQ = NA_integer_, LALVVAR = "TRTEDTM")
)

adsl <- adsl %>%
  select(-LSTALVDT) %>% # created in the previous call
  derive_var_lstalvdt(
    ae_src1, ae_src2, lb_src, adsl_src,
    source_datasets = list(ae = ae, adsl = adsl, lb = lb)
  )
dataset_vignette(
  adsl, 
  display_vars = vars(USUBJID, TRTEDT, DTHDTC, LSTALVDT, LALVDOM, LALVSEQ, LALVVAR), 
  filter =  !is.na(TRTSDT)
)

Derive Groupings and Populations {#groupings}

Grouping (e.g. AGEGR1 or REGION1) {#groupings_ex}

Numeric and categorical variables (AGE, RACE, COUNTRY, etc.) may need to be grouped to perform the required analysis. {admiral} does not currently have functionality to assist with all required groupings. Some functions exist for age grouping according to FDA or EMA conventions. For others, the user can create his/her own function to meet his/her study requirement.

To derive AGEGR1 as categorized AGE in < 18, 18-65, >= 65 (FDA convention):

adsl <- adsl %>%
  derive_agegr_fda(
    age_var = AGE,
    new_var = AGEGR1
  )

However for example if

the user defined function(s) would be like:

format_agegr2 <- function(x) {
  case_when(
    !is.na(x) & x < 65 ~ "< 65",
    x >= 65 ~ ">= 65",
    TRUE ~ NA_character_
  )
}

format_region1 <- function(x) {
  case_when(
    x %in% c("CAN", "USA") ~ "North America",
    !is.na(x) ~ "Rest of the World",
    TRUE ~ "Missing"
  )
}

These functions are then used in a mutate() statement to derive the required grouping variables:

adsl <- adsl %>%
  mutate(
    AGEGR2 = format_agegr2(AGE),
    REGION1 = format_region1(COUNTRY)
  )
dataset_vignette(
  adsl, 
  display_vars = vars(USUBJID, AGE, SEX, COUNTRY, AGEGR1, AGEGR2, REGION1)
)

Population Flags (e.g. SAFFL) {#popflag}

Since the populations flags are mainly company/study specific and that it can easily be derived using an if_else() statement, {admiral} did not implement a functionality to assist with populations flags.

An example of a simple implementation could be:

adsl <- adsl %>%
  mutate(
    SAFFL = if_else(!is.na(TRTSDT), "Y", NA_character_)
  )
dataset_vignette(
  adsl , 
  display_vars = vars(USUBJID, TRTSDT, ARM, ACTARM, SAFFL)
)

Derive Other Variables {#other}

The users can add specific code to cover their need for the analysis.

Add Labels and Attributes {#attributes}

Adding labels and attributes for SAS transport files is supported by the following packages:

NOTE: All these packages are in the experimental phase, but the vision is to have them associated with an End to End pipeline under the umbrella of the pharmaverse.

Example Script

ADaM | Sample Code ---- | -------------- ADSL | ad_adsl.R{target="_blank"}



epijim/admiral documentation built on Feb. 13, 2022, 12:15 a.m.