Description Usage Arguments Value
View source: R/feature_generation.R
Generic function for creating statistics on the response column, based on custom columns.
1 2 3 4 | pipe_create_stats(train,
stat_cols = colnames(train)[purrr::map_lgl(train, is.character)],
response, functions = list(mean = mean), interaction_level = 1,
too_few_observations_cutoff = 30, quantile_trim_threshold = 0)
|
train |
The train dataset, as a data.frame or data.table. Data.tables may be changed by reference. |
stat_cols |
A character vector of column names. Please ensure that you only choose column names of non-numeric columns |
response |
The column containing the response variable. |
functions |
A (named) list of functions to be used to generate statistics. Will take a vector and should return a scalar, e.g. mean / sd. If names are provided, the name will be prepended to the generate column. If they are not provided, gen<index of function>_ will be prepended. |
interaction_level |
An integer of 1 or higher. Should we gather statistics only for one column, or also for combinations of columns? |
too_few_observations_cutoff |
An integer denoting the minimum required observations for a combination of values in statistics_col to be used. If not enough observations are present, the statistics will be generated on the entire response column. Default: 30. |
quantile_trim_threshold |
Determines the quantile to which we'll trim the generated statistics. For instance, when this is set to .1, the generated statistics will be capped by the 0.1 and 0.9 quantile. Therefor, this should be a value between 0 and 0.5. |
A list containing the transformed train dataset and a trained pipe.
#' @details This function will also generate default values for all generated columns that use the entire response column.
This allows us to ensure no NA values will be present in generated columns when, for instance, a new category is detected or when values are cut-off by
too_few_observations_cutoff
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.