mutate_other: Group infrequent entries into 'Other category'

View source: R/mutate_other.R

mutate_otherR Documentation

Group infrequent entries into 'Other category'

Description

Useful when you want to constrain the number of unique values in a column by keeping only the most common values.

Usage

mutate_other(
  .data,
  var,
  n = 5,
  count,
  by = NULL,
  var.weight = NULL,
  mass = NULL,
  copy = TRUE,
  other.category = "Other"
)

Arguments

.data

Data containing variable.

var

Variable containing infrequent entries, to be collapsed into "Other".

n

Threshold for total number of categories above "Other".

count

Threshold for total count of observations before "Other".

by

Extra variables to group by when calculating n or count.

var.weight

Variable to act as a weight: var's where the sum of this variable exceeds mass will be kept, others set to other.category.

mass

Threshold for sum of var.weight: any var where the aggregated sum of var.weight exceeds mass will be kept and other var will be set to other.category. By default (mass = NULL), the value of mass is -∞, with a warning. You may set it explicitly to -Inf if you really want to avoid a warning that this function will have no effect.

copy

Should .data be copied? Currently only TRUE is supported.

other.category

Value that infrequent entries are to be collapsed into. Defaults to "Other".

Value

.data but with var changed so that infrequent values have the same value (other.category).

Examples

library(data.table)
library(magrittr)

DT <- data.table(City = c("A", "A", "B", "B", "C", "D"),
                 value = c(1, 9, 4, 4, 5, 11))

DT %>%
  mutate_other("City", var.weight = "value", mass = 10) %>%
  .[]
  

hutils documentation built on April 13, 2022, 5:23 p.m.