knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
clinTrialData is a community-grown library of clinical trial example
datasets for R. The package ships with a core set of studies and is designed
to expand over time — anyone can contribute a new data source, and users can
download any available study on demand without waiting for a new package
release.
Data is stored in Parquet format and accessed through the connector package,
giving a consistent API regardless of which study you are working with.
Key features:
download_study() to fetch any available study and cache it locallyconnect_clinical_data() to connect to any available data sourcelist_data_sources() finds all studies on your machine; list_available_studies() shows everything available to download# Install from CRAN install.packages("clinTrialData") # Or the development version from GitHub: # install.packages("remotes") remotes::install_github("Lovemore-Gakava/clinTrialData")
library(clinTrialData) # Studies on your machine (bundled + previously downloaded) list_data_sources()
The package bundles the CDISC Pilot 01 study, so you can connect immediately:
# Connect to CDISC Pilot data db <- connect_clinical_data("cdisc_pilot") # List available datasets in the ADaM domain db$adam$list_content_cnt() # Read the subject-level dataset adsl <- db$adam$read_cnt("adsl") head(adsl[, c("USUBJID", "TRT01A", "AGE", "SEX", "RACE")])
Studies beyond the bundled data can be downloaded from GitHub Releases:
# What's available to download? list_available_studies() # Download a study once — cached locally from then on download_study("cdisc_pilot_extended") # Where is the cache? cache_dir()
# Dimensions dim(adsl) # Quick structure overview str(adsl, list.len = 10)
# Read adverse events data adae <- db$adam$read_cnt("adae") head(adae[, c("USUBJID", "AEDECOD", "AESEV", "AESER")])
# Read demographics dm <- db$sdtm$read_cnt("dm") head(dm[, c("USUBJID", "ARM", "AGE", "SEX", "RACE")])
library(dplyr) # Basic demographic summary by treatment adsl |> group_by(TRT01A) |> summarise( n = n(), mean_age = mean(AGE, na.rm = TRUE), female_pct = mean(SEX == "F", na.rm = TRUE) * 100, .groups = "drop" )
Anyone can add a new study to the library. Datasets live on GitHub Releases, not inside the package — so no pull request or CRAN submission is needed to add data.
Organise your Parquet files by domain:
your_new_study/
├── adam/
│ ├── adsl.parquet
│ └── adae.parquet
└── sdtm/
├── dm.parquet
└── ae.parquet
Open an issue to request a release slot, then use the helper script:
source("data-raw/upload_to_release.R") # Upload the data zip upload_study_to_release("your_new_study", tag = "v1.1.0") # Generate and upload metadata (enables dataset_info() for your study) generate_and_upload_metadata( source = "your_new_study", description = "Brief description of your study", version = "v1.1.0", license = "Your license here", source_url = "https://link-to-original-data", tag = "v1.1.0" )
dataset_info("your_new_study") # inspect before downloading download_study("your_new_study") # download and cache connect_clinical_data("your_new_study")
No CRAN submission required. The study is available to all users as soon as it is uploaded.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.