View source: R/clean_categorical.R
clean_categorical | R Documentation |
Applies a dictionary of value-replacement pairs to clean and standardize values of categorical variables. Includes options for text standardization to standardize minor differences in character case, spacing, and punctuation.
clean_categorical(
x,
dict_allowed,
dict_clean = NULL,
vars_id = NULL,
col_allowed_var = "variable",
col_allowed_value = "value",
non_allowed_to_missing = TRUE,
fn = std_text,
na = ".na"
)
x |
A data frame with one or more columns to clean |
dict_allowed |
Dictionary of allowed values for each variable of
interest. Must include columns for "variable" and "value" (the names of
which can be modified with args |
dict_clean |
Optional dictionary of value-replacement pairs (e.g.
produced by |
vars_id |
Optional vector of one or more ID columns within If not specified the cleaning dictionary contains one entry for each unique combination of variable and non-valid value. If specified the cleaning dictionary contains one entry for each unique combination of variable, non-valid value, and ID variable. |
col_allowed_var |
Name of column in |
col_allowed_value |
Name of column in |
non_allowed_to_missing |
Logical indicating whether to replace values that remain non-allowed, even after cleaning and standardization, to NA. Defaults to TRUE. If no dictionary is provided, will simply standardize columns to match
allowed values specified in |
fn |
Function to standardize raw values in both the dataset and
dictionary prior to comparing, to account for minor variation in character
case, spacing, punctuation, etc. Defaults to |
na |
Keyword to use within column "replacement" for values that should
be converted to |
The original data frame x
but with cleaned versions of the categorical
variables specified in argument dict_allowed
# load example dataset, dictionary of allowed categorical values, and
# cleaning dictionary
data(ll1)
data(dict_categ1)
data(clean_categ1)
# dictionary-based corrections to categorical vars
clean_categorical(
ll1,
dict_allowed = dict_categ1,
dict_clean = clean_categ1
)
# require exact matching, including character case
clean_categorical(
ll1,
dict_allowed = dict_categ1,
dict_clean = clean_categ1,
fn = identity
)
# apply standardization to dict_allowed but no additional dict-based cleaning
clean_categorical(
ll1,
dict_allowed = dict_categ1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.