knitr::opts_chunk$set(echo = TRUE)

library(pointblank)

Printing {pointblank} Objects in an R Markdown Document

ptblank_agent

Let's walk through a data quality analysis of an extremely small table. It's actually called small_table and we can find it as a dataset in this package.

small_table

We ought to think about what's tolerable in terms of data quality so let's designate proportional failure thresholds to the warn, stop, and notify states using action_levels().

al <- 
  action_levels(
      warn_at = 0.10,
      stop_at = 0.25,
    notify_at = 0.35
  )

Now create a pointblank agent object and give it the al object (which serves as a default for all validation steps which can be overridden). The static thresholds provided by al will make the reporting a bit more useful.

agent <- 
  create_agent(
    read_fn = ~ small_table,
    tbl_name = "small_table",
    label = "An example.",
    actions = al
  )

Then, as with any agent object, we can add steps to the validation plan by using as many validation functions as we want. Then, we use interrogate() to physically perform the validations and gather intel.

agent <-
  agent %>% 
  col_exists(vars(date, date_time)) %>%
  col_vals_regex(
    vars(b),
    regex = "[0-9]-[a-z]{3}-[0-9]{3}"
  ) %>%
  rows_distinct() %>%
  col_vals_gt(vars(d), value = 100) %>%
  col_vals_lte(vars(c), value = 5) %>%
  col_vals_equal(
    vars(d), value = vars(d),
    na_pass = TRUE
  ) %>%
  col_vals_between(
    vars(c),
    left = vars(a), right = vars(d),
    na_pass = TRUE
  ) %>%
  interrogate()

class(agent)

Calling agent will print the agent report with the default options.

agent

ptblank_agent_report

We can get a gt_tbl object directly with get_agent_report(agent).

agent_report <- get_agent_report(agent, title = "Validation Report")

class(agent_report)

Calling agent_report will print the customized agent report.

agent_report

ptblank_multiagent

Let's walk through several theoretical data quality analyses of small_table.

We'll begin by setting failure limits and signal conditions, we designate proportional failure thresholds to the warn, stop, and notify states using action_levels().

al <- 
  action_levels(
    warn_at = 0.05,
    stop_at = 0.10,
    notify_at = 0.20
  )

We will create four different agents and have slightly different validation steps in each of them; in the first, agent_1, eight different validation steps are created and the agent will interrogate the small_table.

agent_1 <-
  create_agent(
    read_fn = ~ small_table,
    tbl_name = "small_table",
    label = "`get_multiagent_report()`",
    actions = al
  ) %>%
  col_vals_gt(
    vars(date_time),
    value = vars(date),
    na_pass = TRUE
  ) %>%
  col_vals_gt(
    vars(b), 
    value = vars(g),
    na_pass = TRUE
  ) %>%
  rows_distinct() %>%
  col_vals_equal(
    vars(d), 
    value = vars(d),
    na_pass = TRUE
  ) %>%
  col_vals_between(
    vars(c), 
    left = vars(a), right = vars(d)
  ) %>%
  col_vals_not_between(
    vars(c),
    left = 10, right = 20,
    na_pass = TRUE
  ) %>%
  rows_distinct(vars(d, e, f)) %>%
  col_is_integer(vars(a)) %>%
  interrogate()

The second agent, agent_2, retains all of the steps of agent_1 and adds two more (the last of which is inactive).

agent_2 <- 
  agent_1 %>%
  col_exists(vars(date, date_time)) %>%
  col_vals_regex(
    vars(b), 
    regex = "[0-9]-[a-z]{3}-[0-9]{3}",
    active = FALSE
  ) %>%
  interrogate()

The third agent, agent_3, adds a single validation step, removes the fifth one, and deactivates the first.

agent_3 <- 
  agent_2 %>%
  col_vals_in_set(
    vars(f),
    set = c("low", "mid", "high")
  ) %>%
  remove_steps(i = 5) %>%
  deactivate_steps(i = 1) %>%
  interrogate()

The fourth and final agent, agent_4, reactivates steps 1 and 10, and removes the sixth step.

agent_4 <-
  agent_3 %>%
  activate_steps(i = 1) %>%
  activate_steps(i = 10) %>%
  remove_steps(i = 6) %>%
  interrogate()

While all the agents are slightly different from each other, we can still get a combined report of them by creating a 'multiagent'.

multiagent <-
  create_multiagent(
    agent_1, agent_2, agent_3, agent_4
  )

class(multiagent)

Calling multiagent prints the multiagent report with the default options.

multiagent

ptblank_multiagent_report.long

We can use get_multiagent_report() to customize the report. By default (with display_mode = "long"), it gives you a long report with agent reports being stacked.

long_report <- 
  get_multiagent_report(multiagent, title = "Several Validation Reports")

class(long_report)

Calling long_report will print the customized multiagent report.

long_report

ptblank_multiagent_report.wide

We can opt for a wide display of he reporting info, and this is great when reporting on multiple validations of the same target table.

wide_report <- 
  get_multiagent_report(multiagent, display_mode = "wide")

class(wide_report)

Calling wide_report will print the customized multiagent report.

wide_report

ptblank_informant

Generate a basic informant object with the small_table dataset.

informant <- create_informant(small_table)

class(informant)

Calling informant prints the informant report with the default options.

informant

We can get a customized version of the report by using get_informant_report()

informant_report <- 
  get_informant_report(informant, title = "Data Dictionary", lang = "de")

class(informant_report)

Calling informant_report will print the customized informant report.

informant_report

ptblank_tbl_scan

A table can be scanned, giving us an HTML-based information summary.

table_scan <- scan_data(small_table, sections = "OVS")

class(table_scan)

Calling table_scan prints the report with the options provided earlier.

table_scan


rich-iannone/pointblank documentation built on May 10, 2024, 12:46 p.m.