View source: R/cdisc_validate.R
| cdisc_compare | R Documentation |
Flagship function that compares two datasets AND runs CDISC validation on both. Combines dataset comparison with CDISC conformance analysis to provide comprehensive insights into both differences and regulatory compliance.
cdisc_compare(
df1,
df2,
domain = NULL,
standard = NULL,
id_vars = NULL,
vars = NULL,
ts_data = NULL,
detect_outliers = FALSE,
tolerance = 0,
where = NULL
)
df1 |
First data frame to compare, or a file path (character string
ending in |
df2 |
Second data frame to compare, or a file path. |
domain |
Optional character string specifying the CDISC domain code or dataset name (e.g., "DM", "AE", "ADSL"). Strongly recommended – auto-detection can be ambiguous for datasets with common columns. If NULL, auto-detected from df1. |
standard |
Optional character string: "SDTM" or "ADaM". If NULL, auto-detected from df1. |
id_vars |
Optional character vector of ID variable names (e.g.,
|
vars |
Optional character vector of variable names to compare. Only these columns are included in value comparison. Structural and CDISC validation still covers all columns. |
ts_data |
Optional data frame of the TS (Trial Summary) domain. When provided, CDISC standard versions (e.g., SDTM IG 3.4, ADaM IG 1.3) are extracted and included in the results and reports. If NULL (default), version information is omitted. |
detect_outliers |
Logical. When TRUE, runs z-score outlier detection on numeric columns and includes results in the output. Defaults to FALSE. |
tolerance |
Numeric tolerance value for floating-point comparisons (default 0). When tolerance > 0, numeric values are considered equal if their absolute difference is within the tolerance threshold. Character and factor columns always use exact matching regardless of tolerance. |
where |
Optional filter expression as a string (e.g., "AESEV == 'SEVERE'"). Applied to both datasets before comparison. Equivalent to a WHERE clause. |
A list containing:
domain |
Character: detected or supplied CDISC domain |
standard |
Character: detected or supplied CDISC standard (SDTM/ADaM) |
nrow_df1 |
Integer: number of rows in df1 |
ncol_df1 |
Integer: number of columns in df1 |
nrow_df2 |
Integer: number of rows in df2 |
ncol_df2 |
Integer: number of columns in df2 |
id_vars |
Character vector of ID variables used for matching (NULL if positional matching was used) |
comparison |
Result of |
variable_comparison |
Result of |
metadata_comparison |
List of metadata differences: type_mismatches, label_mismatches, length_mismatches, format_mismatches, column ordering |
observation_comparison |
Result of |
unified_comparison |
Data frame combining attribute and value differences per variable. Columns: variable, attribute, base_value, compare_value, and optionally id columns and row when value differences exist |
unmatched_rows |
List with df1_only and df2_only data frames of rows that could not be matched by id_vars (NULL when id_vars is not used) |
cdisc_validation_df1 |
CDISC validation results for df1 |
cdisc_validation_df2 |
CDISC validation results for df2 |
cdisc_conformance_comparison |
Data frame showing which CDISC issues are unique to df1, unique to df2, or common to both |
outlier_notes |
Data frame of z-score outliers (|z| > 3) found in numeric columns of either dataset (NULL when detect_outliers is FALSE) |
cdisc_version |
List of CDISC version information extracted from TS
domain (NULL when ts_data is not provided). See |
# Create sample SDTM DM domains
dm1 <- data.frame(
STUDYID = "STUDY001",
USUBJID = c("SUBJ001", "SUBJ002"),
DMSEQ = c(1, 1),
RACE = c("WHITE", "BLACK OR AFRICAN AMERICAN"),
stringsAsFactors = FALSE
)
dm2 <- data.frame(
STUDYID = "STUDY001",
USUBJID = c("SUBJ001", "SUBJ003"),
DMSEQ = c(1, 1),
RACE = c("WHITE", "ASIAN"),
ETHNIC = c("NOT HISPANIC", "NOT HISPANIC"),
stringsAsFactors = FALSE
)
# Positional matching (default)
result <- cdisc_compare(dm1, dm2, domain = "DM", standard = "SDTM")
# Key-based matching by ID variables
result <- cdisc_compare(dm1, dm2, domain = "DM", id_vars = c("USUBJID"))
names(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.