Ordering of steps

In the recipes package, there are no constraints on the order in which steps are added to the recipe; you as a user are free to apply steps in the order appropriate to your data preprocessing needs. However, the order of steps matters and there are some general suggestions that you should consider.

Transforming a variable

Handling levels in categorical data

The order of steps for handling categorical levels is important, because each step sets levels for the next step to use as input. These steps create factor output, even if the input is of character type.

Dummy variables

Recipes do not automatically create dummy variables (unlike most formula methods).

Recommended preprocessing outline

While every individual project's needs are different, here is a suggested order of potential steps that should work for most problems:

  1. Impute
  2. Handle factor levels
  3. Individual transformations for skewness and other issues
  4. Discretize (if needed and if you have no other choice)
  5. Create dummy variables
  6. Create interactions
  7. Normalization steps (center, scale, range, etc)
  8. Multivariate transformation (e.g. PCA, spatial sign, etc)

Again, your mileage may vary for your particular problem.

Try the recipes package in your browser

Any scripts or data that you put into this service are public.

recipes documentation built on Aug. 26, 2023, 1:08 a.m.