
clinTrialData is a community-grown library of clinical trial
example datasets for R. The package ships with a core set of datasets
and is designed to expand over time — anyone can contribute a new data
source, and users can download any available study on demand without
waiting for a new package release.
Data is stored in Parquet format and accessed through the
connector
package, giving a consistent API regardless of which study you are
working with.
The core idea is simple: datasets live as assets on GitHub Releases, not inside the package itself. This means:
# What's available to download from GitHub Releases?
list_available_studies()
#> source version size_mb cached
#> 1 cdisc_pilot v0.1.0 3.7 TRUE
#> 2 cdisc_pilot_extended v0.1.0 4.3 FALSE
# Inspect any study before downloading — fetches a tiny metadata file
dataset_info("cdisc_pilot_extended")
#> ──────────────────────────────────────────────────────────────────────────
#> cdisc_pilot_extended (v0.1.0)
#> ──────────────────────────────────────────────────────────────────────────
#> Enhanced CDISC Pilot 01 study with urinalysis data
#>
#> Domains & datasets:
#> adam (12): adsl, adae, adlb, adlbc, adlbh, adlbhy, adlburi, ...
#> sdtm (22): ae, cm, dm, ds, ex, lb, mh, qs, relrec, sc, ...
#>
#> Subjects: 254
#> Version: v0.1.0
#> License: CDISC Pilot — educational use
#> Source: https://github.com/cdisc-org/sdtm-adam-pilot-project
#> ──────────────────────────────────────────────────────────────────────────
# Download once; cached locally from then on
download_study("cdisc_pilot_extended")
# Connect and analyse — same API for every study
db <- connect_clinical_data("cdisc_pilot_extended")
adsl <- db$adam$read_cnt("adsl")
# Install from CRAN
install.packages("clinTrialData")
# Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("Lovemore-Gakava/clinTrialData")
library(clinTrialData)
# What's already on your machine?
list_data_sources()
# What's available to download?
list_available_studies()
# Download a study (only needed once — cached locally after that)
download_study("cdisc_pilot")
# Connect and explore
db <- connect_clinical_data("cdisc_pilot")
db$adam$list_content_cnt() # list ADaM datasets
db$sdtm$list_content_cnt() # list SDTM datasets
adsl <- db$adam$read_cnt("adsl")
dm <- db$sdtm$read_cnt("dm")
cdisc_pilot — Standard CDISC Pilot 01 study (11 ADaM, 22 SDTM datasets). Available immediately after installation, no download needed.
cdisc_pilot_extended — Enhanced CDISC Pilot 01 study (11 ADaM, 24 SDTM datasets) with additional features:
download_study("cdisc_pilot_extended")
connect_clinical_data("cdisc_pilot_extended")
Use list_data_sources() to see all locally available studies and
list_available_studies() to see everything on GitHub Releases.
Adding a new study to the library does not require a pull request or a CRAN submission. The data lives on GitHub Releases, not inside the package.
adam/, sdtm/):your_study/
├── adam/
│ ├── adsl.parquet
│ └── adae.parquet
└── sdtm/
├── dm.parquet
└── ae.parquet
source("data-raw/upload_to_release.R")
# Upload the data zip
upload_study_to_release("your_study", tag = "v1.1.0")
# Generate and upload the metadata (enables dataset_info() for your study)
generate_and_upload_metadata(
source = "your_study",
description = "Brief description of your study",
version = "v1.1.0",
license = "Your license here",
source_url = "https://link-to-original-data",
tag = "v1.1.0"
)
dataset_info("your_study") # inspect before downloading
download_study("your_study") # download and cache
connect_clinical_data("your_study")
All datasets — whether bundled or downloaded — are automatically protected from accidental modification. Reading is always allowed; write and delete operations are blocked with a clear error message.
The extended datasets are derived from the CDISC Pilot Study data.
Original Source: CDISC SDTM/ADaM Pilot Project
Modifications: This extended version includes additional derived variables (TRTDURY) and a simulated urinalysis dataset (ADLBURI) created for educational and development purposes.
Acknowledgments: We acknowledge and thank CDISC for making the original pilot data available. The extended datasets maintain the structure and quality of the original data while adding features to support additional analysis scenarios.
# Browse all vignettes
vignette(package = "clinTrialData")
# Cache location
cache_dir()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.