Description Usage Arguments Details Value
View source: R/feature_generation.R
Calculates stats based on custom functions on the response variable for each group provided in stat_cols.
1 2 | create_stats(train, statistics_col, response, functions,
too_few_observations_cutoff = 30, quantile_trim_threshold = 0)
|
train |
The train dataset, as a data.table |
statistics_col |
A character vector of column names. Please ensure that you only choose column names of non-numeric columns or numeric columns with few values. Combinations that generate too few (<30) |
response |
The column containing the response variable. |
functions |
A (named) list of functions to be used to generate statistics. Will take a vector and should return a scalar, e.g. mean / sd. If names are provided, the name will be prepended to the generate column. If they are not provided, gen<index of function>_ will be prepended. |
too_few_observations_cutoff |
An integer denoting the minimum required observations for a combination of values in statistics_col to be used. If not enough observations are present, the statistics will be generated on the entire response column. Default: 30. |
quantile_trim_threshold |
Determines the quantile to which we'll trim the generated statistics. For instance, when this is set to .1, the generated statistics will be capped by the 0.1 and 0.9 quantile. Therefor, this should be a value between 0 and 0.5. |
This function will also generate default values for all generated columns that use the entire response column. This allows us to ensure no NA values will be present in generated columns
A list containing the generated statistics tables and defaults per columns
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.