| categ_reducer | R Documentation |
This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines
categ_reducer(
df,
var,
nmin = 0,
pmin = 0,
pcummax = 100,
top = NA,
pvalue_max = 1,
cor_var = "tag",
limit = 20,
other_label = "other",
...
)
df |
Categorical Vector |
var |
Variable. Which variable do you wish to reduce? |
nmin |
Integer. Number of minimum times a value is repeated |
pmin |
Numerical. Percentage of minimum times a value is repeated |
pcummax |
Numerical. Top cumulative percentage of most repeated values |
top |
Integer. Keep the n most frequently repeated values |
pvalue_max |
Numeric (0-1]. Max pvalue categories |
cor_var |
Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary). |
limit |
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to |
other_label |
Character. With which text do you wish to replace the filtered values with? |
... |
Additional parameters. |
data.frame df on which var has been transformed
Other Data Wrangling:
balance_data(),
cleanText(),
date_cuts(),
date_feats(),
file_name(),
formatHTML(),
holidays(),
impute(),
left(),
normalize(),
num_abbr(),
ohe_commas(),
ohse(),
quants(),
removenacols(),
replaceall(),
replacefactor(),
textFeats(),
textTokenizer(),
vector2text(),
year_month(),
zerovar()
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.