Getting Started with clinTrialData

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

clinTrialData is a community-grown library of clinical trial example datasets for R. The package ships with a core set of studies and is designed to expand over time — anyone can contribute a new data source, and users can download any available study on demand without waiting for a new package release.

Data is stored in Parquet format and accessed through the connector package, giving a consistent API regardless of which study you are working with.

Key features:

Installation

# Install from CRAN
install.packages("clinTrialData")

# Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("Lovemore-Gakava/clinTrialData")

Available Data Sources

library(clinTrialData)

# Studies on your machine (bundled + previously downloaded)
list_data_sources()

Quick Start

Connect to a Data Source

The package bundles the CDISC Pilot 01 study, so you can connect immediately:

# Connect to CDISC Pilot data
db <- connect_clinical_data("cdisc_pilot")

# List available datasets in the ADaM domain
db$adam$list_content_cnt()

# Read the subject-level dataset
adsl <- db$adam$read_cnt("adsl")
head(adsl[, c("USUBJID", "TRT01A", "AGE", "SEX", "RACE")])

Discover and Download Additional Studies

Studies beyond the bundled data can be downloaded from GitHub Releases:

# What's available to download?
list_available_studies()

# Download a study once — cached locally from then on
download_study("cdisc_pilot_extended")

# Where is the cache?
cache_dir()

Explore the Data

# Dimensions
dim(adsl)

# Quick structure overview
str(adsl, list.len = 10)

Working with Different Domains

ADaM Datasets

# Read adverse events data
adae <- db$adam$read_cnt("adae")
head(adae[, c("USUBJID", "AEDECOD", "AESEV", "AESER")])

SDTM Datasets

# Read demographics
dm <- db$sdtm$read_cnt("dm")
head(dm[, c("USUBJID", "ARM", "AGE", "SEX", "RACE")])

Example Analysis

library(dplyr)

# Basic demographic summary by treatment
adsl |>
  group_by(TRT01A) |>
  summarise(
    n = n(),
    mean_age = mean(AGE, na.rm = TRUE),
    female_pct = mean(SEX == "F", na.rm = TRUE) * 100,
    .groups = "drop"
  )

Contributing New Data Sources

Anyone can add a new study to the library. Datasets live on GitHub Releases, not inside the package — so no pull request or CRAN submission is needed to add data.

Step 1: Prepare your data

Organise your Parquet files by domain:

your_new_study/
├── adam/
│   ├── adsl.parquet
│   └── adae.parquet
└── sdtm/
    ├── dm.parquet
    └── ae.parquet

Step 2: Upload data and metadata to a GitHub Release

Open an issue to request a release slot, then use the helper script:

source("data-raw/upload_to_release.R")

# Upload the data zip
upload_study_to_release("your_new_study", tag = "v1.1.0")

# Generate and upload metadata (enables dataset_info() for your study)
generate_and_upload_metadata(
  source      = "your_new_study",
  description = "Brief description of your study",
  version     = "v1.1.0",
  license     = "Your license here",
  source_url  = "https://link-to-original-data",
  tag         = "v1.1.0"
)

Step 3: Users can inspect and access it immediately

dataset_info("your_new_study")       # inspect before downloading
download_study("your_new_study")     # download and cache
connect_clinical_data("your_new_study")

No CRAN submission required. The study is available to all users as soon as it is uploaded.



Try the clinTrialData package in your browser

Any scripts or data that you put into this service are public.

clinTrialData documentation built on March 3, 2026, 5:07 p.m.