shrink: Subset only required columns

Description Usage Arguments Details Value Examples

View source: R/shrink.R

Description

shrink() subsets data to only contain the required columns specified by the prototype, ptype.

Usage

1
shrink(data, ptype)

Arguments

data

A data frame containing the data to subset.

ptype

A data frame prototype containing the required columns.

Details

shrink() is called by forge() before scream() and before the actual processing is done.

Value

A tibble containing the required columns.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100,]
test <- iris[101:150,]

# ---------------------------------------------------------------------------
# shrink()

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)

# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)

# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))

DavisVaughan/hardhat documentation built on Oct. 5, 2021, 9:53 a.m.