Create a data quality profile (main function)

Description

Tests a database against a set of rules (one per line) in a 'data dictionary file'. Rules will be summarized in the returned object: the variable/column, the rule, any comment after the rule, the execution success, the total number of rule violations if any, the record id for any non-compliant records. Rules that can't be executed for any reason will be marked as 'failed'.

Usage

1
datadict.profile(atable, adictionary)

Arguments

atable

a data.frame

adictionary

a list of rules in rule format

Details

The rule file must be a simple list of one rule per line. Functions can be used but since they are applied on a 'vector' (the column) they should be used within a sapply statement (see example rule file). Rules may be separated by empty lines or lines with comment character #. Comments after a rule within the same line will be used for display in the summary table and should be short. A rule must only test one variable and one aspect at a time.

Value

a data.profile object or NA

Author(s)

Reinhard Simon

See Also

Other datadict: as.rules; as_rules; datadict_profile; has.ruleErrors; has_rule_errors; is.datadict.profile; is_datadict_profile; prep4rep; read.rules; read_rules

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
library(stringr)
# Get example data files
atable <- system.file("examples/db.csv", package = "datacheck")
arule <- system.file("examples/rules1.R", package = "datacheck")
aloctn <- system.file("examples/location.csv", package = "datacheck")  # for use in is.oneOf

ctable <- basename(atable)
crule <- basename(arule)
cloctn <- basename(aloctn)

cwd <- tempdir()
owd <- getwd()
setwd(cwd)

file.copy(atable, ctable)
file.copy(arule, crule)
file.copy(aloctn, cloctn)

at <- read.csv(ctable, stringsAsFactors = FALSE)
ad <- read_rules(crule)

db <- datadict_profile(at, ad)

is_datadict_profile(db) == TRUE

db

setwd(owd)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.