blueprint_macros: Macros for blueprint authoring

blueprint_macrosR Documentation

Macros for blueprint authoring

Description

blueprintr uses code inspection to identify and trace dataset dependencies. These macro functions signal a dependency to blueprintr and evaluate to symbols to be analyzed in the drake plan.

Usage

.TARGET(bp_name, .env = parent.frame())

.BLUEPRINT(bp_name, .env = parent.frame())

.META(bp_name, .env = parent.frame())

.SOURCE(dat_name)

mark_source(dat)

Arguments

bp_name

Character string of blueprint's name

.env

The environment in which to evaluate the macro. For internal use only!

dat_name

Character string of an object's name, used exclusively for marking "sources"

dat

A data.frame-like object

Functions

  • .TARGET(): Gets symbol of built and checked data

  • .BLUEPRINT(): Gets symbol of blueprint reference in plan

  • .META(): Gets symbol of metadata reference in plan

  • .SOURCE(): Gets a symbol for an object intended to be a "data source"

  • mark_source(): Mark an data.frame-like object as a source table

When to use

Generally speaking, the .BLUEPRINT and .META macros should be used for check functions, which frequently require context, e.g. in the form of configuration from the blueprint or coding expectations from the metadata. .TARGET is primarily used in blueprint commands, but there could be situations where a check depends on the content of another dataset.

It is important to note that the symbols generated by these macros are only understood in the context of a drake plan. The targets associated with the symbols are generated when blueprints are attached to a plan.

Sources

Sources are an ability to add variable UUIDs to objects that are not constructed using blueprints. This is often the case if the sourced table derives from some intermittent HTTP query or a file from disk. Blueprints have limited capability of configuring the underlying target behavior during the ⁠_initial⁠ phase, so often it is easier to do that sort of fetching and pre-processing before using blueprints. However, you lose the benefit of variable lineage when you don't use blueprints. "Sources" are simply data.frame-like objects that have the ".uuid" attribute for each variable so that variable lineage can cover the full data lifetime. Use blueprintr::mark_source() to add the UUID attributes, and then use .SOURCE() in the blueprints so lineage can be captured

Examples

.TARGET("example_dataset")
.BLUEPRINT("example_dataset")
.META("example_dataset")

blueprint(
  "test_bp",
  description = "Blueprint with dependencies",
  command =
    .TARGET("parent1") %>%
      left_join(.TARGET("parent2"), by = "id") %>%
      filter(!is.na(id))
)

nyuglobalties/blueprintr documentation built on July 16, 2024, 10:27 a.m.