Function to design variable treatments for binary prediction of a
categorical outcome. Data frame is assumed to have only atomic columns
except for dates (which are converted to numeric). Note: re-encoding high cardinality
categorical variables can introduce undesirable nested model bias, for such data consider
using `mkCrossFrameCExperiment`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
designTreatmentsC(
dframe,
varlist,
outcomename,
outcometarget = TRUE,
...,
weights = c(),
minFraction = 0.02,
smFactor = 0,
rareCount = 0,
rareSig = NULL,
collarProb = 0,
codeRestriction = NULL,
customCoders = NULL,
splitFunction = NULL,
ncross = 3,
forceSplit = FALSE,
catScaling = TRUE,
verbose = TRUE,
parallelCluster = NULL,
use_parallel = TRUE,
missingness_imputation = NULL,
imputation_map = NULL
)
`dframe`
Data frame to learn treatments from (training data), must have at least 1 row. |

`varlist`
Names of columns to treat (effective variables). |

`outcomename`
Name of column holding outcome variable. dframe[[outcomename]] must be only finite non-missing values. |

`outcometarget`
Value/level of outcome to be considered "success", and there must be a cut such that dframe[[outcomename]]==outcometarget at least twice and dframe[[outcomename]]!=outcometarget at least twice. |

`...`
no additional arguments, declared to forced named binding of later arguments |

`weights`
optional training weights for each row |

`minFraction`
optional minimum frequency a categorical level must have to be converted to an indicator column. |

`smFactor`
optional smoothing factor for impact coding models. |

`rareCount`
optional integer, allow levels with this count or below to be pooled into a shared rare-level. Defaults to 0 or off. |

`rareSig`
optional numeric, suppress levels from pooling at this significance value greater. Defaults to NULL or off. |

`collarProb`
what fraction of the data (pseudo-probability) to collar data at if doCollar is set during |

`codeRestriction`
what types of variables to produce (character array of level codes, NULL means no restriction). |

`customCoders`
map from code names to custom categorical variable encoding functions (please see https://github.com/WinVector/vtreat/blob/master/extras/CustomLevelCoders.md). |

`splitFunction`
(optional) see vtreat::buildEvalSets . |

`ncross`
optional scalar >=2 number of cross validation splits use in rescoring complex variables. |

`forceSplit`
logical, if TRUE force cross-validated significance calculations on all variables. |

`catScaling`
optional, if TRUE use glm() linkspace, if FALSE use lm() for scaling. |

`verbose`
if TRUE print progress. |

`parallelCluster`
(optional) a cluster object created by package parallel or package snow. |

`use_parallel`
logical, if TRUE use parallel methods (when parallel cluster is set). |

`missingness_imputation`
function of signature f(values: numeric, weights: numeric), simple missing value imputer. |

`imputation_map`
map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers. |

The main fields are mostly vectors with names (all with the same names in the same order):

- vars : (character array without names) names of variables (in same order as names on the other diagnostic vectors) - varMoves : logical TRUE if the variable varied during hold out scoring, only variables that move will be in the treated frame - #' - sig : an estimate significance of effect

See the vtreat vignette for a bit more detail and a worked example.

Columns that do not vary are not passed through.

Note: re-encoding high cardinality on training data can introduce nested model bias, consider using `mkCrossFrameCExperiment`

instead.

treatment plan (for use with prepare)

`prepare.treatmentplan`

, `designTreatmentsN`

, `designTreatmentsZ`

, `mkCrossFrameCExperiment`

