inst/testdata/QUICKSTART.md

Quick Start Guide - SDTM Test Datasets

Overview

Two versions of realistic synthetic SDTM clinical trial datasets for Study CLIN-2025-042 with intentional differences to test comparison and validation tools.

Dataset Versions

Files Available

Demographics (DM)

Adverse Events (AE)

Laboratory (LB)

Vital Signs (VS)

Exposure (EX)

Key Characteristics

Study Design

Demographics

Data Types

Common Use Cases

1. Test Data Comparison Tools

Compare v1 and v2 to identify: - New subjects (3 in DM) - New records (50 AE, 20 VS) - Updated values (100 LB, 50 VS, etc.) - Corrected values (10 RACE, 5 AGE, etc.) - New columns (ETHNIC in DM v2)

2. Data Validation

Check for: - SDTM compliance - Data type consistency - Valid value ranges - Referential integrity (USUBJID linkage) - Date sequence validity

3. ETL Testing

Process datasets for: - Data transformation pipelines - Subject key management - Visit scheduling - Laboratory normal ranges - Treatment assignment tracking

4. Reconciliation Tools

Practice reconciliation between: - Interim (v1) and Final (v2) datasets - Version tracking - Change detection - Audit trails

Column Quick Reference

All Files

Domain-Specific Key Variables

DM: SUBJID, RFSTDTC, RFENDTC, SITEID, SEX, AGE, RACE, ETHNIC, ARMCD, ARM, COUNTRY v2 only

AE: AESEQ, AETERM, AEDECOD, AEBODSYS, AESEV, AESER, AEACN, AEREL, AEOUT, AESTDTC, AEENDTC

LB: LBSEQ, LBTESTCD, LBTEST, LBORRES, LBORRESU, LBSTRESN, LBSTRESU, LBNRIND, VISITNUM, VISIT, LBDTC

VS: VSSEQ, VSTESTCD, VSTEST, VSORRES, VSORRESU, VSSTRESN, VSSTRESU, VISITNUM, VISIT, VSDTC

EX: EXSEQ, EXTRT, EXDOSE, EXDOSU, EXDOSFRM, EXROUTE, EXSTDTC, EXENDTC, VISITNUM, VISIT, EPOCH

Intentional Differences (v1 → v2)

| Domain | Change Type | Count | Details | |--------|------------|-------|---------| | DM | New Column | 1 | ETHNIC added | | DM | New Subjects | 3 | SUBJID: NEW00, NEW01, NEW02 | | DM | Corrected RACE | 10 | Data corrections | | DM | Corrected AGE | 5 | Data corrections | | AE | New Records | 50 | New AE observations | | AE | Updated AESEV | 13 | Severity corrections | | AE | Corrected AEREL | 8 | Relationship corrections | | LB | Updated LBSTRESN | 100 | Result value updates | | LB | Corrected LBNRIND | 17 | Normal indicator fixes | | VS | New Records | 20 | New measurements | | VS | Updated VSSTRESN | 50 | Result value updates | | EX | Corrected EXDOSE | 11 | Dose adjustments | | EX | Corrected EXROUTE | 3 | Route corrections |

Sample Data

Load in Python

import pandas as pd

# Load demographics v1
dm = pd.read_csv('dm_v1.csv')
print(dm.head())
print(f"Subjects: {len(dm)}")
print(f"Sites: {dm['SITEID'].nunique()}")
print(f"Arms: {dm['ARMCD'].unique()}")

Load in R

library(readr)

# Load adverse events v1
ae <- read_csv('ae_v1.csv')
head(ae)
nrow(ae)  # Total AE records
unique(ae$AEBODSYS)  # Body systems

Load in SQL

-- Create table from v1 data
CREATE TABLE dm_v1 AS
SELECT * FROM (
  LOAD DATA FROM 'dm_v1.csv' FORMAT CSV
)

-- Compare v1 and v2
SELECT COUNT(*) FROM dm_v2 
WHERE SUBJID NOT IN (SELECT SUBJID FROM dm_v1)
-- Returns: 3 (new subjects)

Data Validation Checks

Expected Ranges

Cardinality Checks

Referential Integrity

Reproducibility

Support Files

Questions?

Refer to README.txt for detailed variable definitions and data specifications.



Try the clinCompare package in your browser

Any scripts or data that you put into this service are public.

clinCompare documentation built on Feb. 19, 2026, 1:07 a.m.