View source: R/clean_variables.R
clean_variables | R Documentation |
Clean variable labels and fix spelling according to a wordlist
clean_variables( x, sep = "_", wordlists = NULL, spelling_vars = 3, sort_by = NULL, protect = FALSE, classes = NULL, warn_spelling = FALSE )
x |
a |
sep |
The separator used between words, and defaults to the underscore
|
wordlists |
a data frame or named list of data frames with at least two
columns defining the word list to be used. If this is a data frame, a third
column must be present to split the wordlists by column in |
spelling_vars |
character or integer. If |
sort_by |
a character the column to be used for sorting the values in each data frame. If the incoming variables are factors, this determines how the resulting factors will be sorted. |
protect |
a logical or numeric vector defining the columns to protect
from any manipulation. Note: columns in |
classes |
a vector of class definitions for each of the columns. If this
is not provided, the classes will be read from the columns themselves.
Practically, this is used in |
warn_spelling |
if |
Zhian N. Kamvar
clean_variable_labels()
to standardise text,
clean_variable_spelling()
to correct spelling with a wordlist.
## make toy data toy_data <- messy_data(20) # location data with mis-spellings, French, and English. messy_locations <- c("hopsital", "h\u00f4pital", "hospital", "m\u00e9dical", "clinic", "feild", "field") toy_data$location <- sample(messy_locations, 20, replace = TRUE) ## show data toy_data # clean labels clean_variables(toy_data) # by default, it's the same as clean_variable_lables # add a wordlist wordlist <- data.frame( from = c("hopsital", "hopital", "medical", "feild"), to = c("hospital", "hospital", "clinic", "field"), variable = rep("location", 4), stringsAsFactors = FALSE ) clean_variables(toy_data, wordlists = wordlist, spelling_vars = "variable" )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.