Conditioner

Starting with version 1.4.8 the R version of vtreat includes an additional .fit_transform() interface. This is a back-port from the Python version of vtreat.

The idea is from sklearn's pipeline. It works as follows.

We define mutable objects to manage the variable preparation. This is standard for Python, but a bit unusual for R. However, it has some notation advantages.

These objects define three primary methods:

$fit()
$transform()
$fit_transform()

They work as follows.

$fit(): Takes training data as an argument and learns the correct data preparation plan from the data. The plan is kept inside the object as a side-effect. The object itself is returned, allowing method chaining.
$transform(): Uses the in-object stored treatment plan to treat new data (given as an argument).
$fit_transform(): Performs the cross-validated work required to avoid nested-model bias. The nested model bias we are working to avoid is an over fit due to using data for data transform design, and then naively treating the same data using the transform for down-stream modeling. $fit_transform() in this case is not in fact a shorthand for $fit()$transform(), but in fact a different method that takes extra steps to make sure the fit and transform are jointly correct.

This corresponds to the classic R vtreat notations as follows:

plan$fit(d) ~ plan <- designTreatments*(d)
plan$transform(d) ~ prepare(plan, d)
plan$fit_transform(d) ~ mkCrossFrame*Experiment(d)$crossFrame

Both notation systems are good, the R one being more "R-like" (using the usual immutable objects) and the .fit_transform() one being more Pythonic. We expect to teach and maintain both paradigms.

Examples of the modeling typical tasks in both notations can be found here:

Regression: R notation, fit_prepare() notation.
Binary Classification: R notation, fit_prepare() notation.
Unsupervised Coding: R notation, fit_prepare() notation.
Multinomial Classification: R notation, fit_prepare() notation.

WinVector/vtreat documentation built on Jan. 12, 2025, 6:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WinVector/vtreat
A Statistically Sound 'data.frame' Processor/Conditioner

Examples/fit_transform/fit_transform_api.md
In WinVector/vtreat: A Statistically Sound 'data.frame' Processor/Conditioner

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/vtreat A Statistically Sound 'data.frame' Processor/Conditioner

Examples/fit_transform/fit_transform_api.md In WinVector/vtreat: A Statistically Sound 'data.frame' Processor/Conditioner

R Package Documentation

Browse R Packages

We want your feedback!

WinVector/vtreat
A Statistically Sound 'data.frame' Processor/Conditioner

Examples/fit_transform/fit_transform_api.md
In WinVector/vtreat: A Statistically Sound 'data.frame' Processor/Conditioner