categ_reducer: Reduce categorical values

Description Usage Arguments Value See Also Examples

View source: R/other_functions.R

Description

This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
categ_reducer(
  df,
  var,
  nmin = 0,
  pmin = 0,
  pcummax = 100,
  top = NA,
  pvalue_max = 1,
  cor_var = "tag",
  limit = 20,
  other_label = "other",
  ...
)

Arguments

df

Categorical Vector

var

Variable. Which variable do you wish to reduce?

nmin

Integer. Number of minimum times a value is repeated

pmin

Numerical. Percentage of minimum times a value is repeated

pcummax

Numerical. Top cumulative percentage of most repeated values

top

Integer. Keep the n most frequently repeated values

pvalue_max

Numeric (0-1]. Max pvalue categories

cor_var

Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary).

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

other_label

Character. With which text do you wish to replace the filtered values with?

...

Additional parameters

Value

data.frame df on which var has been transformed

See Also

Other Data Wrangling: balance_data(), cleanText(), date_cuts(), date_feats(), formatNum(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), removenacols(), removenarows(), replaceall(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()

Examples

1
2
3
4
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)

lares documentation built on June 9, 2021, 9:06 a.m.