detect_cdisc_domain: Detect CDISC Domain Type

View source: R/cdisc_validate.R

detect_cdisc_domainR Documentation

Detect CDISC Domain Type

Description

Detects whether a data frame looks like an SDTM domain or ADaM dataset by comparing column names against known CDISC standards. Calculates a confidence score based on the percentage of expected variables present.

Auto-detection is a convenience for exploratory use. For anything important – validation reports, regulatory submissions, scripted pipelines – always pass domain and standard explicitly. Datasets with common columns (STUDYID, USUBJID, etc.) can match multiple domains, and a warning is issued when the top two candidates score within 10 percentage points of each other.

Usage

detect_cdisc_domain(df, name_hint = NULL)

Arguments

df

A data frame to analyze.

name_hint

Optional character string with the dataset name (e.g., "DM", "ADLB", or a filename like "adlb.xpt"). When provided and it matches a known CDISC domain, that candidate receives a strong confidence boost. This makes detection much more accurate when the filename is available.

Value

A list containing:

standard

Character: "SDTM", "ADaM", or "Unknown"

domain

Character: domain code (e.g., "DM", "AE") or dataset name (e.g., "ADSL"), or NA

confidence

Numeric between 0 and 1 indicating match quality

message

Character: human-readable explanation

Examples


# Create a sample SDTM DM domain
dm <- data.frame(
  STUDYID = "STUDY001",
  USUBJID = "SUBJ001",
  SUBJID = "001",
  DMSEQ = 1,
  RACE = "WHITE",
  ETHNIC = "NOT HISPANIC OR LATINO",
  ARMCD = "ARM01",
  ARM = "Treatment A",
  stringsAsFactors = FALSE
)

result <- detect_cdisc_domain(dm)
print(result)


clinCompare documentation built on Feb. 19, 2026, 1:07 a.m.