R/developer.R

#' @name developer_functions
#'
#' @title Developer functions for creating recipes steps
#'
#' @description
#'
#' This page provides a comprehensive list of the exported functions for
#' creating recipes steps and guidance on how to use them.
#'
#' @details
#'
#' # Creating steps
#'
#' [add_step()] and [add_check()] are required when creating a new step. The
#' output of [add_step()] should be the return value of all steps and should
#' have the following format:
#'
#' ```r
#' step_example <- function(recipe,
#'                          ...,
#'                          role = NA,
#'                          trained = FALSE,
#'                          skip = FALSE,
#'                          id = rand_id("example")) {
#'   add_step(
#'     recipe,
#'     step_example_new(
#'       terms = enquos(...),
#'       role = role,
#'       trained = trained,
#'       skip = skip,
#'       id = id
#'     )
#'   )
#' }
#' ```
#'
#' [rand_id()] should be used in the arguments of `step_example()` to specify
#' the argument, as we see in the above example.
#'
#' [recipes_pkg_check()] should be used in `step_example()` functions together
#' with [required_pkgs()] to alert users that certain other packages are
#' required. The standard way of using this function is the following format:
#'
#' ```r
#' recipes_pkg_check(required_pkgs.step_example())
#' ```
#'
#' [step()] and [check()] are used within the `step_*_new()` function that you
#' use in your new step. It will be used in the following way:
#'
#' ```r
#' step_example_new <- function(terms, role, trained, skip, id) {
#'   step(
#'     subclass = "example",
#'     terms = terms,
#'     role = role,
#'     trained = trained,
#'     skip = skip,
#'     id = id
#'   )
#' }
#' ```
#'
#' [recipes_eval_select()] is used within `prep.step_*()` functions, and are
#' used to turn the `terms` object into a character vector of the selected
#' variables.
#'
#' It will most likely be used like so:
#'
#' ```r
#' col_names <- recipes_eval_select(x$terms, training, info)
#' ```
#'
#' [check_type()] can be used within `prep.step_*()` functions to check that the
#' variables passed in are the right types. We recommend that you use the
#' `types` argument as it offers higher flexibility and it matches the types
#' defined by [.get_data_types()]. When using `types` we find it better to be
#' explicit, e.g. writing `types = c("double", "integer")` instead of `types =
#' "numeric"`, as it produces cleaner error messages.
#'
#' It should be used like so:
#'
#' ```r
#' check_type(training[, col_names], types = c("double", "integer"))
#' ```
#'
#' [check_new_data()] should be used within `bake.step_*()`. This function is
#' used to make check that the required columns are present in the data. It
#' should be one of the first lines inside the function.
#'
#' It should be used like so:
#'
#' ```r
#' check_new_data(names(object$columns), object, new_data)
#' ```
#'
#' [check_name()] should be used in `bake.step_*()` functions for steps that add
#' new columns to the data set. The function throws an error if the column names
#' already exist in the data set. It should be called before adding the new
#' columns to the data set.
#'
#' [get_keep_original_cols()] and [remove_original_cols()] are used within steps
#' with the `keep_original_cols` argument. [get_keep_original_cols()] is used in
#' `prep.step_*()` functions for steps that were created before the
#' `keep_original_cols` argument was added, and acts as a way to throw a warning
#' that the user should regenerate the recipe. [remove_original_cols()] should
#' be used in `bake.step_*()` functions to remove the original columns. It is
#' worth noting that [remove_original_cols()] can remove multiple columns at
#' once and when possible should be put outside `for` loops.
#'
#' ```r
#' new_data <- remove_original_cols(new_data, object, names_of_original_cols)
#' ```
#'
#' [recipes_remove_cols()] should be used in `prep.step_*()` functions, and is
#' used to remove columns from the data set, either by using the
#' `object$removals` field or by using the `col_names` argument.
#'
#' [get_case_weights()] and [are_weights_used()] are functions that help you
#' extract case weights and help determine if they are used or not within the
#' step. They will typically be used within the `prep.step_*()` functions if the
#' step in question supports case weights.
#'
#' [print_step()] is used inside `print.step_*()` functions. This function is
#' replacing the internally deprecated [printer()] function.
#'
#' [sel2char()] is mostly used within `tidy.step_*()` functions to turn
#' selections into character vectors.
#'
#' [names0()] creates a series of `num` names with a common prefix. The names
#' are numbered with leading zeros (e.g. `prefix01`-`prefix10` instead of
#' `prefix1`-`prefix10`). This is useful for many types of steps that produce
#' new columns.
#'
#' # Interacting with recipe objects
#'
#' [detect_step()] returns a logical indicator to determine if a given step or
#' check is included in a recipe.
#'
#' [fully_trained()] returns a logical indicator if the recipe is fully trained.
#' The function [is_trained()] can be used to check in any individual steps are
#' trained or not.
#'
#' [.get_data_types()] is an S3 method that is used for [selections]. This method
#' can be extended to work with column types not supported by recipes.
#'
#' [recipes_extension_check()] is recommended to be used by package authors to
#' make sure that all steps have `prep.step_*()`, `bake.step_*()`,
#' `print.step_*()`, `tidy.step_*()`, and `required_pkgs.step_*()` methods. It
#' should be used as a test, preferably like this:
#'
#' ```r
#' test_that("recipes_extension_check", {
#'   expect_snapshot(
#'     recipes::recipes_extension_check(
#'       pkg = "pkgname"
#'     )
#'   )
#' })
#' ```
NULL

Try the recipes package in your browser

Any scripts or data that you put into this service are public.

recipes documentation built on Aug. 26, 2023, 1:08 a.m.