View source: R/clean_variables.R
| clean_variables | R Documentation |
Clean variable labels and fix spelling according to a wordlist
clean_variables( x, sep = "_", wordlists = NULL, spelling_vars = 3, sort_by = NULL, protect = FALSE, classes = NULL, warn_spelling = FALSE )
x |
a |
sep |
The separator used between words, and defaults to the underscore
|
wordlists |
a data frame or named list of data frames with at least two
columns defining the word list to be used. If this is a data frame, a third
column must be present to split the wordlists by column in |
spelling_vars |
character or integer. If |
sort_by |
a character the column to be used for sorting the values in each data frame. If the incoming variables are factors, this determines how the resulting factors will be sorted. |
protect |
a logical or numeric vector defining the columns to protect
from any manipulation. Note: columns in |
classes |
a vector of class definitions for each of the columns. If this
is not provided, the classes will be read from the columns themselves.
Practically, this is used in |
warn_spelling |
if |
Zhian N. Kamvar
clean_variable_labels() to standardise text,
clean_variable_spelling() to correct spelling with a wordlist.
## make toy data
toy_data <- messy_data(20)
# location data with mis-spellings, French, and English.
messy_locations <- c("hopsital", "h\u00f4pital", "hospital",
"m\u00e9dical", "clinic",
"feild", "field")
toy_data$location <- sample(messy_locations, 20, replace = TRUE)
## show data
toy_data
# clean labels
clean_variables(toy_data) # by default, it's the same as clean_variable_lables
# add a wordlist
wordlist <- data.frame(
from = c("hopsital", "hopital", "medical", "feild"),
to = c("hospital", "hospital", "clinic", "field"),
variable = rep("location", 4),
stringsAsFactors = FALSE
)
clean_variables(toy_data,
wordlists = wordlist,
spelling_vars = "variable"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.