Getting Started with clinTrialData
In clinTrialData: Clinical Trial Example Datasets

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

clinTrialData is a community-grown library of clinical trial example datasets for R. The package ships with a core set of studies and is designed to expand over time — anyone can contribute a new data source, and users can download any available study on demand without waiting for a new package release.

Data is stored in Parquet format and accessed through the connector package, giving a consistent API regardless of which study you are working with.

Key features:

Growing library: New datasets are added by the community as GitHub Release assets — no CRAN resubmission needed
On-demand download: Use download_study() to fetch any available study and cache it locally
Generic interface: Use connect_clinical_data() to connect to any available data source
Automatic discovery: list_data_sources() finds all studies on your machine; list_available_studies() shows everything available to download
Data protection: Downloaded and bundled datasets are locked against accidental modification

Installation

# Install from CRAN
install.packages("clinTrialData")

# Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("Lovemore-Gakava/clinTrialData")

Available Data Sources

library(clinTrialData)

# Studies on your machine (bundled + previously downloaded)
list_data_sources()

Quick Start

Connect to a Data Source

The package bundles the CDISC Pilot 01 study, so you can connect immediately:

# Connect to CDISC Pilot data
db <- connect_clinical_data("cdisc_pilot")

# List available datasets in the ADaM domain
db$adam$list_content_cnt()

# Read the subject-level dataset
adsl <- db$adam$read_cnt("adsl")
head(adsl[, c("USUBJID", "TRT01A", "AGE", "SEX", "RACE")])

Discover and Download Additional Studies

Studies beyond the bundled data can be downloaded from GitHub Releases:

# What's available to download?
list_available_studies()

# Download a study once — cached locally from then on
download_study("cdisc_pilot_extended")

# Where is the cache?
cache_dir()

Explore the Data

# Dimensions
dim(adsl)

# Quick structure overview
str(adsl, list.len = 10)

Working with Different Domains

ADaM Datasets

# Read adverse events data
adae <- db$adam$read_cnt("adae")
head(adae[, c("USUBJID", "AEDECOD", "AESEV", "AESER")])

SDTM Datasets

# Read demographics
dm <- db$sdtm$read_cnt("dm")
head(dm[, c("USUBJID", "ARM", "AGE", "SEX", "RACE")])

Example Analysis

library(dplyr)

# Basic demographic summary by treatment
adsl |>
  group_by(TRT01A) |>
  summarise(
    n = n(),
    mean_age = mean(AGE, na.rm = TRUE),
    female_pct = mean(SEX == "F", na.rm = TRUE) * 100,
    .groups = "drop"
  )

Contributing New Data Sources

Anyone can add a new study to the library. Datasets live on GitHub Releases, not inside the package — so no pull request or CRAN submission is needed to add data.

Step 1: Prepare your data

Organise your Parquet files by domain:

your_new_study/
├── adam/
│   ├── adsl.parquet
│   └── adae.parquet
└── sdtm/
    ├── dm.parquet
    └── ae.parquet

Step 2: Upload data and metadata to a GitHub Release

Open an issue to request a release slot, then use the helper script:

source("data-raw/upload_to_release.R")

# Upload the data zip
upload_study_to_release("your_new_study", tag = "v1.1.0")

# Generate and upload metadata (enables dataset_info() for your study)
generate_and_upload_metadata(
  source      = "your_new_study",
  description = "Brief description of your study",
  version     = "v1.1.0",
  license     = "Your license here",
  source_url  = "https://link-to-original-data",
  tag         = "v1.1.0"
)

Step 3: Users can inspect and access it immediately

dataset_info("your_new_study")       # inspect before downloading
download_study("your_new_study")     # download and cache
connect_clinical_data("your_new_study")

No CRAN submission required. The study is available to all users as soon as it is uploaded.

Any scripts or data that you put into this service are public.

clinTrialData documentation built on March 3, 2026, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clinTrialData
Clinical Trial Example Datasets

Getting Started with clinTrialData
In clinTrialData: Clinical Trial Example Datasets

Introduction

Installation

Available Data Sources

Quick Start

Connect to a Data Source

Discover and Download Additional Studies

Explore the Data

Working with Different Domains

ADaM Datasets

SDTM Datasets

Example Analysis

Contributing New Data Sources

Step 1: Prepare your data

Step 2: Upload data and metadata to a GitHub Release

Step 3: Users can inspect and access it immediately

Try the clinTrialData package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

clinTrialData Clinical Trial Example Datasets

Getting Started with clinTrialData In clinTrialData: Clinical Trial Example Datasets

Introduction

Installation

Available Data Sources

Quick Start

Connect to a Data Source

Discover and Download Additional Studies

Explore the Data

Working with Different Domains

ADaM Datasets

SDTM Datasets

Example Analysis

Contributing New Data Sources

Step 1: Prepare your data

Step 2: Upload data and metadata to a GitHub Release

Step 3: Users can inspect and access it immediately

Try the clinTrialData package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

clinTrialData
Clinical Trial Example Datasets

Getting Started with clinTrialData
In clinTrialData: Clinical Trial Example Datasets