Examples/fit_transform/fit_prepare_api.md

Starting with version 1.6.0 the R version of vtreat exposes an additional fit_prepare interface, based on the API of the Python version of vtreat.

The idea is from sklearn's pipeline. It works as follows.

For each of the common modeling tasks, the user constructs the appropriate, uninitialized data treatment object (a "spec"):

Each object defines three primary methods:

They work as follows.

fit_prepare() performs the cross-validated work required to avoid nested-model bias. The nested model bias we are working to avoid is an over fit due to using data for data transform design, and then naively treating the same data using the transform for down-stream modeling. Note that (except in the unsupervised case) fit_prepare(spec, d) is not a shorthand for fit(spec, d) %.>% prepare(., d), but in fact a different method hat takes extra steps to make sure the fit and treatment plan are jointly correct.

This corresponds to the classic R vtreat notations as follows:

We introduced this notation into the R version of vtreat for consistency of notation, to take advantage of the excellent Scikit-learn paradigm, and to compensate for some unfortunate name choices during the early development of vtreat in R. Both notations have the same underlying implementation, and we expect to teach and maintain both paradigms.

Examples of the modeling typical tasks in both notations can be found here:



WinVector/vtreat documentation built on Aug. 29, 2023, 4:49 a.m.