Two versions of realistic synthetic SDTM clinical trial datasets for Study CLIN-2025-042 with intentional differences to test comparison and validation tools.
dm_v1.csv - 500 subjects, 16 variablesdm_v2.csv - 503 subjects, 17 variables (ETHNIC added)ae_v1.csv - 1,495 adverse eventsae_v2.csv - 1,545 adverse events (+50 new)lb_v1.csv - 16,000 lab test resultslb_v2.csv - 16,000 lab test results (100 value corrections, 17 indicator corrections)vs_v1.csv - 14,000 vital sign measurementsvs_v2.csv - 14,020 vital sign measurements (+20 new, 50 value updates)ex_v1.csv - 1,500 treatment exposuresex_v2.csv - 1,500 treatment exposures (11 dose corrections, 3 route corrections)Compare v1 and v2 to identify: - New subjects (3 in DM) - New records (50 AE, 20 VS) - Updated values (100 LB, 50 VS, etc.) - Corrected values (10 RACE, 5 AGE, etc.) - New columns (ETHNIC in DM v2)
Check for: - SDTM compliance - Data type consistency - Valid value ranges - Referential integrity (USUBJID linkage) - Date sequence validity
Process datasets for: - Data transformation pipelines - Subject key management - Visit scheduling - Laboratory normal ranges - Treatment assignment tracking
Practice reconciliation between: - Interim (v1) and Final (v2) datasets - Version tracking - Change detection - Audit trails
DM: SUBJID, RFSTDTC, RFENDTC, SITEID, SEX, AGE, RACE, ETHNIC, ARMCD, ARM, COUNTRY v2 only
AE: AESEQ, AETERM, AEDECOD, AEBODSYS, AESEV, AESER, AEACN, AEREL, AEOUT, AESTDTC, AEENDTC
LB: LBSEQ, LBTESTCD, LBTEST, LBORRES, LBORRESU, LBSTRESN, LBSTRESU, LBNRIND, VISITNUM, VISIT, LBDTC
VS: VSSEQ, VSTESTCD, VSTEST, VSORRES, VSORRESU, VSSTRESN, VSSTRESU, VISITNUM, VISIT, VSDTC
EX: EXSEQ, EXTRT, EXDOSE, EXDOSU, EXDOSFRM, EXROUTE, EXSTDTC, EXENDTC, VISITNUM, VISIT, EPOCH
| Domain | Change Type | Count | Details | |--------|------------|-------|---------| | DM | New Column | 1 | ETHNIC added | | DM | New Subjects | 3 | SUBJID: NEW00, NEW01, NEW02 | | DM | Corrected RACE | 10 | Data corrections | | DM | Corrected AGE | 5 | Data corrections | | AE | New Records | 50 | New AE observations | | AE | Updated AESEV | 13 | Severity corrections | | AE | Corrected AEREL | 8 | Relationship corrections | | LB | Updated LBSTRESN | 100 | Result value updates | | LB | Corrected LBNRIND | 17 | Normal indicator fixes | | VS | New Records | 20 | New measurements | | VS | Updated VSSTRESN | 50 | Result value updates | | EX | Corrected EXDOSE | 11 | Dose adjustments | | EX | Corrected EXROUTE | 3 | Route corrections |
import pandas as pd
# Load demographics v1
dm = pd.read_csv('dm_v1.csv')
print(dm.head())
print(f"Subjects: {len(dm)}")
print(f"Sites: {dm['SITEID'].nunique()}")
print(f"Arms: {dm['ARMCD'].unique()}")
library(readr)
# Load adverse events v1
ae <- read_csv('ae_v1.csv')
head(ae)
nrow(ae) # Total AE records
unique(ae$AEBODSYS) # Body systems
-- Create table from v1 data
CREATE TABLE dm_v1 AS
SELECT * FROM (
LOAD DATA FROM 'dm_v1.csv' FORMAT CSV
)
-- Compare v1 and v2
SELECT COUNT(*) FROM dm_v2
WHERE SUBJID NOT IN (SELECT SUBJID FROM dm_v1)
-- Returns: 3 (new subjects)
README.txt - Detailed variable descriptionsQUICKSTART.md - This guideGENERATION_SUMMARY.md - Complete technical documentationRefer to README.txt for detailed variable definitions and data specifications.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.