generalizeCategorical: Generalize a Categorical Column

Description Usage Arguments Details Value See Also Examples

View source: R/generalizeCategorical.R

Description

Generalize a Categorical Column

Usage

1
generalizeCategorical(x, col, newCategories, mapping)

Arguments

x

a data.frame

col

the column that is to be generalized - may be numeric (column indices) or character (column names)

newCategories

new categories to replace the old

mapping

a numeric vector containing a mapping of old categories to new categories. See details for more information.

Details

This function generalizes the categories of a column to a smaller set of categories by merging categories together. The initial column is assumed to be categorical. The function iterates through each of the original categories and replaces all instances of them in the column with the new category it is mapped to. To find the original categories, call unique(x[[col]]) This order is used for the mappings. The mapping is a numeric vector. Each element in mapping can be thought of as having two values: 1) The position of the element in mappings, which represents the index of the old category in unique(x[[col]]) 2) The value of the element, which represents the index of the new category in newCategories These two values determine how old categories are mapped to new categories. Note that the legth of mapping must be equal to the number of unique categories.

Value

data.frame with with specified column generalized

See Also

Other generalize.functions: generalizeNumeric; generalize

Examples

1
2
3
4
5
6
7
8
maritalStatus <- c("Married", "Single", "Single", "Divorced", "Married", "Divorced")
race <- c("Caucasian", "Hispanic", "Black", "Asian", "Other", "Caucasian")
education <- c("High School or Less", "High School Grad", "Undergraduate", "Master's", "PhD", "Undergraduate")
data <- data.frame(ages, maritalStatus)

generalize_categorical(data, 1, newCategories=c("Has Been Married","Has Not Been Married"), mapping=c(1,2,1,1))
generalize_categorical(data, 2, newCategories=c("Human"), mapping=c(1,1,1,1))
generalize_categorical(data, 3, newCategories=c("High School", "College", "Advanced Degree"), mapping=c(1,1,2,3,3))

shuklak13/anonymizeR documentation built on May 29, 2019, 9:27 p.m.