knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Main Idea

The main idea of {admiral} is that an ADaM dataset is built by a sequence of derivations. Each derivation adds one or more variables or parameters to the processed dataset. This modular approach makes it easy to adjust code by adding, removing, or modifying derivations. Each derivation is a function call. Consider for example the following script which creates a (very simple) ADSL dataset.

library(dplyr)
library(lubridate)
library(admiral)
library(admiral.test)

# read in SDTM datasets
data("dm")
data("ds")
data("ex")
# derive treatment variables
adsl <- dm %>%
  mutate(TRT01P = ARMCD, TRT01A = ACTARMCD) %>%
  derive_var_trtsdtm(dataset_ex = ex) %>%
  derive_var_trtedtm(dataset_ex = ex) %>%
  derive_vars_dtm_to_dt(vars(TRTSDTM, TRTEDTM)) %>%
  derive_var_trtdurd()
dataset_vignette(adsl, 
                 display_vars = vars(STUDYID, USUBJID, TRT01P, TRT01A,
                                     TRTSDTM, TRTEDTM, TRTSDT, TRTEDT,
                                     TRTDURD))



Derivations

The most important functions in {admiral} are the derivations. These functions start with derive_. The first parameter of these functions expects the input dataset. This allows us to string together derivations using the %>% operator.

Functions which derive a dedicated variable start with derive_var_ followed by the variable name, e.g., derive_var_trtdurd() derives the TRTDURD variable.

Functions which derive a dedicated parameter start with derive_param_ followed by the parameter name, e.g., derive_param_bmi() derives the BMI parameter.

Input and Output

It is expected that the input dataset is not grouped. Otherwise an error is issued.

The input dataset should not include variables starting with temp_. These variable names are reserved for temporary variables used within the derivation and are removed from the output dataset. If the input dataset contains such variables, an error is issued.

It is expected all variable names are uppercase in the input dataset and new variables will be returned in uppercase. It is expected all SUPPxx.QNAM values are uppercase to conform with this convention when using the derive_vars_suppqual() to adhere to the uppercase requirement.

The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.

Computations

Computations expect vectors as input and return a vector. These function can be used in expressions like convert_dtc_to_dt() in the derivation of FINLABDT in the example below:

# derive final lab visit date
ds_final_lab_visit <- ds %>% 
  filter(DSDECOD == "FINAL LAB VISIT") %>% 
  transmute(USUBJID, FINLABDT = convert_dtc_to_dt(DSSTDTC))

# derive treatment variables  
adsl <- dm %>%
  mutate(TRT01P = ARMCD, TRT01A = ACTARMCD) %>%
  derive_var_trtsdtm(dataset_ex = ex) %>%
  derive_var_trtedtm(dataset_ex = ex) %>%
  derive_vars_dtm_to_dt(vars(TRTSDTM, TRTEDTM)) %>%
  derive_var_trtdurd() %>%
  # merge on final lab visit date
  left_join(ds_final_lab_visit, by = "USUBJID")
dataset_vignette(adsl, 
                 display_vars = vars(STUDYID, USUBJID, TRT01P, TRT01A,
                                     TRTSDTM, TRTEDTM, TRTSDT, TRTEDT,
                                     TRTDURD, FINLABDT))



Parameters

For parameters which expect variable names or expressions of variable names, symbols or expressions must be specified rather than strings.

Handling of Missing Values

When using the {haven} package to read SAS datasets into R, SAS-style character missing values, i.e. "", are not converted into proper R NA values. Rather they are kept as is. This is problematic for any downstream data processing as R handles "" just as any other string. Thus, before any data manipulation is being performed SAS blanks should be converted to R NAs using {admiral}'s convert_blanks_to_na() function, e.g.

dm <- haven::read_sas("dm.sas7bdat") %>% 
  convert_blanks_to_na()

Note that any logical operator being applied to an NA value always returns NA rather than TRUE or FALSE.

visits <- c("Baseline", NA, "Screening", "Week 1 Day 7")
visits != "Baseline"

The only exception is is.na() which returns TRUE if the input is NA.

is.na(visits)

Thus, to filter all visits which are not "Baseline" the following condition would need to be used.

visits != "Baseline" | is.na(visits)

Also note that most aggregation functions, like mean() or max(), also return NA if any element of the input vector is missing.

mean(c(1, NA, 2))

To avoid this behavior one has to explicitly set na.rm = TRUE.

mean(c(1, NA, 2), na.rm = TRUE)

This is very important to keep in mind when using {admiral}'s aggregation functions such as derive_summary_records().

Validation

All functions are reviewed and tested to ensure that they work as described in the documentation. They are not validated yet.

Although {admiral} follows CDISC standards, it does not claim that the dataset resulting from calling {admiral} functions is ADaM compliant. This has to be ensured by the user.

Starting a Script

For the ADaM data structures, an overview of the flow and example function calls for the most common steps are provided by the following vignettes:

{admiral} also provides template R scripts as a starting point. They can be created by calling use_ad_template(), e.g.,

use_ad_template(
  adam_name = "adsl",
  save_path = "./ad_adsl.R"
)

A list of all available templates can be obtained by list_all_templates():

list_all_templates()

Support

Support is provided via the admiral Slack channel.

See also



epijim/admiral documentation built on Feb. 13, 2022, 12:15 a.m.