Introduction to sdtmchecks

The purpose of the sdtmchecks package is to help detect and investigate potential analysis relevant issues in SDTM data. This is done using a set of data check functions. These check functions are intended to be generalizable, actionable, and meaningful for analysis.

Setting Up

To start using sdtmchecks first install it via

# install.packages("devtools")
devtools::install_github("pharmaverse/sdtmchecks", ref="main")

Then just load the package

library(sdtmchecks) 

Documentation

Here's how to access the help page for the package

# type ??sdtmchecks into the console
??sdtmchecks 

Metadata

The package comes with the sdtmchecksmeta dataset which contains metadata on each check function. It contains details like function name, category, priority, and descriptions. Each function is given a Category (Cross Therapeutic Area, Oncology, Covid-19, Patient Reported Outcomes, Ophthalmology) and a Priority (High, Medium, Low).

#Just type this in
sdtmchecksmeta
meta<-subset(sdtmchecksmeta, select=c("check","xls_title","category","priority","domains"))
colnames(meta)<-c("check","title","category","priority", "domains")
head(meta,n=10)

Running a Check

Let's do an example using check_ae_ds_partial_death_dates(AE,DS)

This check flags records with partial death dates (i.e. length <10) in AE and DS. If any are found, then data check returns FALSE with attributes containing a list of flagged records as well as a brief message explaining the result. If no issues are detected the check returns TRUE.

# Create sample data frames
 AE <- data.frame(
  USUBJID = 1:3,
  AEDECOD = c("AE1","AE2","AE3"),
  AEDTHDTC = c("2017-01-01","2017",NA),
  stringsAsFactors=FALSE
 )
 DS <- data.frame(
  USUBJID = 4:7,
  DSSCAT = "STUDY DISCON", 
  DSDECOD = "DEATH",
  DSSTDTC = c("2018-01-01","2017-03-03","2018-01-02","2016-10"),
  stringsAsFactors=FALSE
 )
# Use sample data frames.
AE
DS
# Run the data check.
check_ae_ds_partial_death_dates(AE,DS)

Running Many Checks

Running all the checks on your data is super easy. Just use the run_all_checks function. This function assumes you have all of your sdtm datasets as objects in your global environment, e.g. ae,dm,ex,etc.

# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
ds = haven::read_sas("path/to/ds.sas7bdat")

# Run the checks and save as an object called "myreport"
myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High", "Medium", "Low"), #subset checks based on priority
               type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category
               verbose = TRUE)

class(myreport) #results in a list object
names(myreport) #each check result is saved in a slot of the list
myreport[["check_ae_aedecod"]] #investigate the results of a check

The run_all_checks function also lets you easily subset on category or priority

myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High"),
               type = c("ONC"),
               verbose = TRUE)

You can also choose specific checks to run. Here's a way to get started with some checks that should work fairly well for most datasets

# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
cm = haven::read_sas("path/to/cm.sas7bdat")
dm = haven::read_sas("path/to/dm.sas7bdat")

# Subset to checks that should work OK for most datasets
metads = sdtmchecksmeta %>%
  filter(check %in% c("check_ae_aedecod",
                      "check_ae_aetoxgr",
                      "check_ae_dup",
                      "check_cm_cmdecod",
                      "check_cm_missing_month",
                      "check_dm_age_missing",
                      "check_dm_usubjid_dup",
                      "check_dm_armcd"
                      ))

myreport=run_all_checks(metads = metads,
               verbose = TRUE)

Writing Out Results

You can then write results out to an xlsx for easy sharing.

report_to_xlsx(res=myreport,outfile="check_report.xlsx")

Making a Customizable Script

There's also a convenient helper function to write out a user friendly R script with all the check function calls.

create_R_script(file = "run_the_checks.R")


Try the sdtmchecks package in your browser

Any scripts or data that you put into this service are public.

sdtmchecks documentation built on Sept. 11, 2024, 9:34 p.m.