factorize: Make harmonized factor variables.

Description Usage Arguments Examples

Description

This function makes selected variables as factors and also harmonizes the levels of a new dataset to match those of the fitted dataset. This is important when a factor variable in a test set contains levels that were not present for that variable during training. In that case, the new levels are replaced by the most frequent value found during training.

Usage

1

Arguments

vars

Function or formula that returns selected columns from a data.frame. Alternatively, character vector of column names.

Examples

1
2
3
4
5
df <- data.frame(A=c("a1","a1","a1","a2"), stringsAsFactors = FALSE)
df2 <- data.frame(A=c("a1","a1","a4","a5"), stringsAsFactors = FALSE)
prep <- factorize()
prep$fit(df)
prep$transform(df2)

rtsho/preprocessr documentation built on May 29, 2019, 8:58 a.m.