View source: R/cdisc_validate.R
| detect_cdisc_domain | R Documentation |
Detects whether a data frame looks like an SDTM domain or ADaM dataset by comparing column names against known CDISC standards. Calculates a confidence score based on the percentage of expected variables present.
Auto-detection is a convenience for exploratory use. For anything important –
validation reports, regulatory submissions, scripted pipelines – always pass
domain and standard explicitly. Datasets with common columns
(STUDYID, USUBJID, etc.) can match multiple domains, and a warning is issued
when the top two candidates score within 10 percentage points of each other.
detect_cdisc_domain(df, name_hint = NULL)
df |
A data frame to analyze. |
name_hint |
Optional character string with the dataset name (e.g., "DM", "ADLB", or a filename like "adlb.xpt"). When provided and it matches a known CDISC domain, that candidate receives a strong confidence boost. This makes detection much more accurate when the filename is available. |
A list containing:
standard |
Character: "SDTM", "ADaM", or "Unknown" |
domain |
Character: domain code (e.g., "DM", "AE") or dataset name (e.g., "ADSL"), or NA |
confidence |
Numeric between 0 and 1 indicating match quality |
message |
Character: human-readable explanation |
# Create a sample SDTM DM domain
dm <- data.frame(
STUDYID = "STUDY001",
USUBJID = "SUBJ001",
SUBJID = "001",
DMSEQ = 1,
RACE = "WHITE",
ETHNIC = "NOT HISPANIC OR LATINO",
ARMCD = "ARM01",
ARM = "Treatment A",
stringsAsFactors = FALSE
)
result <- detect_cdisc_domain(dm)
print(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.