conformance_check: Perform Conformance Check on Data Based on Defined Rules

View source: R/conformance_check.R

conformance_checkR Documentation

Perform Conformance Check on Data Based on Defined Rules

Description

This function evaluates a source dataframe ('S_data') against a set of rules defined in a metadata dataframe ('M_data'). It uses a set of default rule functions but can also use a user-provided file.

Usage

conformance_check(
  S_data,
  M_data,
  rule_file = NULL,
  na_as_error = FALSE,
  var_select = "all"
)

Arguments

S_data

A dataframe containing the source data to be checked.

M_data

A metadata dataframe that specifies the rules. It must contain the columns 'VARIABLE', 'Conformance_Rule', and 'Value'.

rule_file

The path to a custom R file where rule functions are defined. If 'NULL' (default), the standard rule definitions file included with the'DQA' package will be used. Instructions for using this file are available under the name 'conformance_rules'.

na_as_error

A logical value. If 'TRUE', 'NA' values in the source data are treated as errors (non-conformant). If 'FALSE' (default), they are ignored.

var_select

Character or integer vector of variables to check. Accepts variable names, column numbers, or a mix. Default is "all" (check all variables in M_data).

Details

The metadata ('M_data') for conformance_check must include:

  • **VARIABLE:** The name of the column in 'S_data' to which the rule applies.

  • **Conformance_Rule:** The name of the rule function to execute for the VARIABLE (must be defined in the rule file).

  • **Value:** Rule parameters such as Allowed length of values,, allowed category values, or column names required for computational checks.

Value

A dataframe containing the results of the conformance check for each rule.

Examples

# 1. Create sample source data (S_data)
S_data <- data.frame(
  id = 1:10,
  national_id = c("1234567890", "0987654321", "123", NA, "1112223334",
                  "1234567890", "5556667778", "9998887770", "12345", "4445556667"),
  gender = c(1, 2, 1, 3, 2, 1, NA, 2, 1, 2), # 1=Male, 2=Female, 3=Error
  age = c(25, 40, 150, 33, -5, 65, 45, 29, 70, 55),
  part_a = c(10, 15, 20, 25, 30, 35, 40, 45, 50, 55),
  part_b = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50),
  total_parts = c(15, 25, 35, 45, 55, 65, 75, 85, 94, 105), # one error in row 9
  stringsAsFactors = FALSE
)

# 2. Create sample metadata (M_data)
M_data <- data.frame(
  VARIABLE = c(
    "national_id",
    "national_id",
    "gender",
    "total_parts"
  ),
  Conformance_Rule = c(
    "length_check",
    "unique_check",
    "category_check",
    "arithmetic_check"
  ),
  Value = c(
    "10",                  # national_id length must be 10
    "",                    # unique
    "1 | 2",               # Allowed values for gender
    "part_a + part_b"      # Computational rule for total_parts
  ),
  stringsAsFactors = FALSE
)

# 3. Run the conformance check using the package's default rules
# Ensure the 'DQA' package is loaded before running
 conformance_results <- conformance_check(S_data = S_data, M_data = M_data)
 print(conformance_results)


DQA documentation built on April 20, 2026, 9:06 a.m.