x_write_disk: Write an _agent_, _informant_, _multiagent_, or table scan to...

View source: R/object_ops.R

x_write_diskR Documentation

Write an agent, informant, multiagent, or table scan to disk

Description

Writing an agent, informant, multiagent, or even a table scan to disk with x_write_disk() can be useful for keeping data validation intel or table information close at hand for later retrieval (with x_read_disk()). By default, any data table that the agent or informant may have held before being committed to disk will be expunged (not applicable to any table scan since they never hold a table object). This behavior can be changed by setting keep_tbl to TRUE but this only works in the case where the table is not of the tbl_dbi or the tbl_spark class.

Usage

x_write_disk(
  x,
  filename,
  path = NULL,
  keep_tbl = FALSE,
  keep_extracts = FALSE,
  quiet = FALSE
)

Arguments

x

One of several types of objects

⁠<object>⁠ // required

An agent object of class ptblank_agent, an informant of class ptblank_informant, or an table scan of class ptblank_tbl_scan.

filename

File name

⁠scalar<character>⁠ // required

The filename to create on disk for the agent, informant, or table scan.

path

File path

⁠scalar<character>⁠ // default: NULL (optional)

An optional path to which the file should be saved (this is automatically combined with filename).

keep_tbl

Keep data table inside object

⁠scalar<logical>⁠ // default: FALSE

An option to keep a data table that is associated with the agent or informant (which is the case when the agent, for example, is created using ⁠create_agent(tbl = <data table, ...)⁠). The default is FALSE where the data table is removed before writing to disk. For database tables of the class tbl_dbi and for Spark DataFrames (tbl_spark) the table is always removed (even if keep_tbl is set to TRUE).

keep_extracts

Keep data extracts inside object

⁠scalar<logical>⁠ // default: FALSE

An option to keep any collected extract data for failing rows. Only applies to agent objects. By default, this is FALSE (i.e., extract data is removed).

quiet

Inform (or not) upon file writing

⁠scalar<logical>⁠ // default: FALSE

Should the function not inform when the file is written?

Details

It is recommended to set up a table-prep formula so that the agent and informant can access refreshed data after being read from disk through x_read_disk(). This can be done initially with the tbl argument of create_agent()/create_informant() by passing in a table-prep formula or a function that can obtain the target table when invoked. Alternatively, we can use the set_tbl() with a similarly crafted tbl expression to ensure that an agent or informant can retrieve a table at a later time.

Value

Invisibly returns TRUE if the file has been written.

Examples

A: Writing an agent to disk

Let's go through the process of (1) developing an agent with a validation plan (to be used for the data quality analysis of the small_table dataset), (2) interrogating the agent with the interrogate() function, and (3) writing the agent and all its intel to a file.

Creating an action_levels object is a common workflow step when creating a pointblank agent. We designate failure thresholds to the warn, stop, and notify states using action_levels().

al <- 
  action_levels(
    warn_at = 0.10,
    stop_at = 0.25,
    notify_at = 0.35
  )

Now, let's create a pointblank agent object and give it the al object (which serves as a default for all validation steps which can be overridden). The data will be referenced in the tbl argument with a leading ~.

agent <- 
  create_agent(
    tbl = ~ small_table,
    tbl_name = "small_table",
    label = "`x_write_disk()`",
    actions = al
  )

Then, as with any agent object, we can add steps to the validation plan by using as many validation functions as we want. After that, use interrogate().

agent <-
  agent %>% 
  col_exists(columns = c(date, date_time)) %>%
  col_vals_regex(
    columns = b,
    regex = "[0-9]-[a-z]{3}-[0-9]{3}"
  ) %>%
  rows_distinct() %>%
  col_vals_gt(columns = d, value = 100) %>%
  col_vals_lte(columns = c, value = 5) %>%
  interrogate()

The agent can be written to a file with the x_write_disk() function.

x_write_disk(
  agent,
  filename = "agent-small_table.rds"
)

We can read the file back as an agent with the x_read_disk() function and we'll get all of the intel along with the restored agent.

If you're consistently writing agent reports when periodically checking data, we could make use of the affix_date() or affix_datetime() depending on the granularity you need. Here's an example that writes the file with the format: "<filename>-YYYY-mm-dd_HH-MM-SS.rds".

x_write_disk(
  agent,
  filename = affix_datetime(
    "agent-small_table.rds"
  )
)

B: Writing an informant to disk

Let's go through the process of (1) creating an informant object that minimally describes the small_table dataset, (2) ensuring that data is captured from the target table using the incorporate() function, and (3) writing the informant to a file.

Create a pointblank informant object with create_informant() and the small_table dataset. Use incorporate() so that info snippets are integrated into the text.

informant <- 
  create_informant(
    tbl = ~ small_table,
    tbl_name = "small_table",
    label = "`x_write_disk()`"
  ) %>%
  info_snippet(
    snippet_name = "high_a",
    fn = snip_highest(column = "a")
  ) %>%
  info_snippet(
    snippet_name = "low_a",
    fn = snip_lowest(column = "a")
  ) %>%
  info_columns(
    columns = a,
    info = "From {low_a} to {high_a}."
  ) %>%
  info_columns(
    columns = starts_with("date"),
    info = "Time-based values."
  ) %>%
  info_columns(
    columns = date,
    info = "The date part of `date_time`."
  ) %>%
  incorporate()

The informant can be written to a file with x_write_disk(). Let's do this with affix_date() so that the filename has a datestamp.

x_write_disk(
  informant,
  filename = affix_date(
    "informant-small_table.rds"
  )
)

We can read the file back into a new informant object (in the same state as when it was saved) by using x_read_disk().

C: Writing a multiagent to disk

Let's create one more pointblank agent object, provide it with some validation steps, and interrogate().

agent_b <-
  create_agent(
    tbl = ~ small_table,
    tbl_name = "small_table",
    label = "`x_write_disk()`",
    actions = al
  ) %>%
  col_vals_gt(
    columns = b,
    value = g,
    na_pass = TRUE,
    label = "b > g"
  ) %>%
  col_is_character(
    columns = c(b, f),
    label = "Verifying character-type columns" 
  ) %>%
  interrogate()

Now we can combine the earlier agent object with the newer agent_b to create a multiagent.

multiagent <- create_multiagent(agent, agent_b)

The multiagent can be written to a file with the x_write_disk() function.

x_write_disk(
  multiagent,
  filename = "multiagent-small_table.rds"
)

We can read the file back as a multiagent with the x_read_disk() function and we'll get all of the constituent agents and their associated intel back as well.

D: Writing a table scan to disk

We can get a report that describes all of the data in the storms dataset.

tbl_scan <- scan_data(tbl = dplyr::storms)

The table scan object can be written to a file with x_write_disk().

x_write_disk(
  tbl_scan,
  filename = "tbl_scan-storms.rds"
)

Function ID

9-1

See Also

Other Object Ops: activate_steps(), deactivate_steps(), export_report(), remove_steps(), set_tbl(), x_read_disk()


pointblank documentation built on Oct. 30, 2024, 9:29 a.m.