detect_cdisc_domain: Detect CDISC Domain Type
In clinCompare: Dataset Comparison with 'CDISC' Validation for Clinical Trial Data

detect_cdisc_domain

R Documentation

Detect CDISC Domain Type

Description

Detects whether a data frame looks like an SDTM domain or ADaM dataset by comparing column names against known CDISC standards. Calculates a confidence score based on the percentage of expected variables present.

Auto-detection is a convenience for exploratory use. For anything important – validation reports, regulatory submissions, scripted pipelines – always pass domain and standard explicitly. Datasets with common columns (STUDYID, USUBJID, etc.) can match multiple domains, and a warning is issued when the top two candidates score within 10 percentage points of each other.

Usage

detect_cdisc_domain(df, name_hint = NULL)

Arguments

`df`	A data frame to analyze.
`name_hint`	Optional character string with the dataset name (e.g., "DM", "ADLB", or a filename like "adlb.xpt"). When provided and it matches a known CDISC domain, that candidate receives a strong confidence boost. This makes detection much more accurate when the filename is available.

Value

A list containing:

`standard`	Character: "SDTM", "ADaM", or "Unknown"
`domain`	Character: domain code (e.g., "DM", "AE") or dataset name (e.g., "ADSL"), or NA
`confidence`	Numeric between 0 and 1 indicating match quality
`message`	Character: human-readable explanation

Examples


# Create a sample SDTM DM domain
dm <- data.frame(
  STUDYID = "STUDY001",
  USUBJID = "SUBJ001",
  SUBJID = "001",
  DMSEQ = 1,
  RACE = "WHITE",
  ETHNIC = "NOT HISPANIC OR LATINO",
  ARMCD = "ARM01",
  ARM = "Treatment A",
  stringsAsFactors = FALSE
)

result <- detect_cdisc_domain(dm)
print(result)

clinCompare documentation built on Feb. 19, 2026, 1:07 a.m.