
Defines functions get_data_extracts

Documented in get_data_extracts

#  This file is part of the 'rstudio/pointblank' project.
#  Copyright (c) 2017-2024 pointblank authors
#  For full copyright and license information, please look at
#  https://rstudio.github.io/pointblank/LICENSE.html

#' Collect data extracts from a validation step
#' @description
#' In an agent-based workflow (i.e., initiating with [create_agent()]), after
#' interrogation with [interrogate()], we can extract the row data that didn't
#' pass row-based validation steps with the `get_data_extracts()` function.
#' There is one discrete extract per row-based validation step and the amount of
#' data available in a particular extract depends on both the fraction of test
#' units that didn't pass the validation step and the level of sampling or
#' explicit collection from that set of units. These extracts can be collected
#' programmatically through `get_data_extracts()` but they may also be
#' downloaded as CSV files from the HTML report generated by the agent's print
#' method or through the use of [get_agent_report()].
#' The availability of data extracts for each row-based validation step depends
#' on whether `extract_failed` is set to `TRUE` within the [interrogate()] call
#' (it is by default). The amount of *fail* rows extracted depends on the
#' collection parameters in [interrogate()], and the default behavior is to
#' collect up to the first 5000 *fail* rows.
#' Row-based validation steps are based on those validation functions of the
#' form `col_vals_*()` and also include [conjointly()] and [rows_distinct()].
#' Only functions from that combined set of validation functions can yield data
#' extracts.
#' @param agent *The pointblank agent object*
#'   `obj:<ptblank_agent>` // **required**
#'   A **pointblank** *agent* object that is commonly created through the use of
#'   the [create_agent()] function. It should have had [interrogate()] called on
#'   it, such that the validation steps were carried out and any sample rows
#'   from non-passing validations could potentially be available in the object.
#' @param i *A validation step number*
#'   `scalar<integer>` // *default:* `NULL` (`optional`)
#'   The validation step number, which is assigned to each validation step by
#'   **pointblank** in the order of definition. If `NULL` (the default), all
#'   data extract tables will be provided in a list object.
#' @return A list of tables if `i` is not provided, or, a standalone table if
#'   `i` is given.
#' @section Examples:
#' Create a series of two validation steps focused on testing row values for
#' part of the `small_table` object. Use [interrogate()] right after that.
#' ```r
#' agent <-
#'   create_agent(
#'     tbl = small_table %>%
#'       dplyr::select(a:f),
#'     label = "`get_data_extracts()`"
#'   ) %>%
#'   col_vals_gt(d, value = 1000) %>%
#'   col_vals_between(
#'     columns = c,
#'     left = vars(a), right = vars(d),
#'     na_pass = TRUE
#'   ) %>%
#'   interrogate()
#' ```
#' Using `get_data_extracts()` with its defaults returns of a list of tables,
#' where each table is named after the validation step that has an extract
#' available.
#' ```r
#' agent %>% get_data_extracts()
#' ```
#' \preformatted{## $`1`
#' ## # A tibble: 6 × 6
#' ##       a b             c     d e     f    
#' ##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#' ## 1     8 3-ldm-038     7  284. TRUE  low  
#' ## 2     7 1-knw-093     3  843. TRUE  high 
#' ## 3     3 5-bce-642     9  838. FALSE high 
#' ## 4     3 5-bce-642     9  838. FALSE high 
#' ## 5     4 2-dmx-010     7  834. TRUE  low  
#' ## 6     2 7-dmx-010     8  108. FALSE low  
#' ## 
#' ## $`2`
#' ## # A tibble: 4 × 6
#' ##       a b             c     d e     f    
#' ##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#' ## 1     6 8-kdg-938     3 2343. TRUE  high 
#' ## 2     8 3-ldm-038     7  284. TRUE  low  
#' ## 3     7 1-knw-093     3  843. TRUE  high 
#' ## 4     4 5-boe-639     2 1036. FALSE low}
#' We can get an extract for a specific step by specifying it in the `i`
#' argument. Let's get the failing rows from the first validation step (the
#' [col_vals_gt()] one).
#' ```r
#' agent %>% get_data_extracts(i = 1)
#' ```
#' \preformatted{## # A tibble: 6 × 6
#' ##       a b             c     d e     f    
#' ##   <int> <chr>     <dbl> <dbl> <lgl> <chr>
#' ## 1     8 3-ldm-038     7  284. TRUE  low  
#' ## 2     7 1-knw-093     3  843. TRUE  high 
#' ## 3     3 5-bce-642     9  838. FALSE high 
#' ## 4     3 5-bce-642     9  838. FALSE high 
#' ## 5     4 2-dmx-010     7  834. TRUE  low  
#' ## 6     2 7-dmx-010     8  108. FALSE low}
#' @family Post-interrogation
#' @section Function ID:
#' 8-2
#' @export
get_data_extracts <- function(
    i = NULL
) {

  # Stop function if the agent hasn't
  # yet performed an interrogation
  if (!inherits(agent, "has_intel")) {
      "The `agent` has not yet performed an interrogation.",
      call. = FALSE
  # Get the number of validation steps
  validation_steps <- unique(agent$validation_set$i)
  if (is.null(i)) {
  # Stop function if the `i`th step does not exist in `agent`
  if (!(i %in% seq(validation_steps))) {
    stop("The provided step number does not exist.", call. = FALSE)
  # Get the names of the extracts
  extract_names <- names(agent$extracts)
  # Stop function if the `i`th step does not have an extract available
  if (!(as.character(i) %in% extract_names)) {
      "The provided step number does not have an associated extract.",
      call. = FALSE
  # Get the data extract
