View source: R/conformance_check.R
| conformance_check | R Documentation |
This function evaluates a source dataframe ('S_data') against a set of rules defined in a metadata dataframe ('M_data'). It uses a set of default rule functions but can also use a user-provided file.
conformance_check(
S_data,
M_data,
rule_file = NULL,
na_as_error = FALSE,
var_select = "all"
)
S_data |
A dataframe containing the source data to be checked. |
M_data |
A metadata dataframe that specifies the rules. It must contain the columns 'VARIABLE', 'Conformance_Rule', and 'Value'. |
rule_file |
The path to a custom R file where rule functions are defined. If 'NULL' (default), the standard rule definitions file included with the'DQA' package will be used. Instructions for using this file are available under the name 'conformance_rules'. |
na_as_error |
A logical value. If 'TRUE', 'NA' values in the source data are treated as errors (non-conformant). If 'FALSE' (default), they are ignored. |
var_select |
Character or integer vector of variables to check. Accepts variable names, column numbers, or a mix. Default is "all" (check all variables in M_data). |
The metadata ('M_data') for conformance_check must include:
**VARIABLE:** The name of the column in 'S_data' to which the rule applies.
**Conformance_Rule:** The name of the rule function to execute for the VARIABLE (must be defined in the rule file).
**Value:** Rule parameters such as Allowed length of values,, allowed category values, or column names required for computational checks.
A dataframe containing the results of the conformance check for each rule.
# 1. Create sample source data (S_data)
S_data <- data.frame(
id = 1:10,
national_id = c("1234567890", "0987654321", "123", NA, "1112223334",
"1234567890", "5556667778", "9998887770", "12345", "4445556667"),
gender = c(1, 2, 1, 3, 2, 1, NA, 2, 1, 2), # 1=Male, 2=Female, 3=Error
age = c(25, 40, 150, 33, -5, 65, 45, 29, 70, 55),
part_a = c(10, 15, 20, 25, 30, 35, 40, 45, 50, 55),
part_b = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50),
total_parts = c(15, 25, 35, 45, 55, 65, 75, 85, 94, 105), # one error in row 9
stringsAsFactors = FALSE
)
# 2. Create sample metadata (M_data)
M_data <- data.frame(
VARIABLE = c(
"national_id",
"national_id",
"gender",
"total_parts"
),
Conformance_Rule = c(
"length_check",
"unique_check",
"category_check",
"arithmetic_check"
),
Value = c(
"10", # national_id length must be 10
"", # unique
"1 | 2", # Allowed values for gender
"part_a + part_b" # Computational rule for total_parts
),
stringsAsFactors = FALSE
)
# 3. Run the conformance check using the package's default rules
# Ensure the 'DQA' package is loaded before running
conformance_results <- conformance_check(S_data = S_data, M_data = M_data)
print(conformance_results)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.