shrink: Subset only required columns

View source: R/shrink.R

shrinkR Documentation

Subset only required columns

Description

shrink() subsets data to only contain the required columns specified by the prototype, ptype.

Usage

shrink(data, ptype, ..., call = current_env())

Arguments

data

A data frame containing the data to subset.

ptype

A data frame prototype containing the required columns.

...

These dots are for future extensions and must be empty.

call

The call used for errors and warnings.

Details

shrink() is called by forge() before scream() and before the actual processing is done.

Value

A tibble containing the required columns.

Examples

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# shrink()

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)

# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)

# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))

tidymodels/hardhat documentation built on Dec. 14, 2024, 11:11 a.m.