clean_variables: Clean variable labels and fix spelling according to a...
In reconhub/linelist: Tools to Import and Tidy Case Linelist Data

clean_variables

R Documentation

Clean variable labels and fix spelling according to a wordlist

Description

Clean variable labels and fix spelling according to a wordlist

Usage

clean_variables(
  x,
  sep = "_",
  wordlists = NULL,
  spelling_vars = 3,
  sort_by = NULL,
  protect = FALSE,
  classes = NULL,
  warn_spelling = FALSE
)

Arguments

`x`	a `data.frame`
`sep`	The separator used between words, and defaults to the underscore `_`.
`wordlists`	a data frame or named list of data frames with at least two columns defining the word list to be used. If this is a data frame, a third column must be present to split the wordlists by column in `x` (see `spelling_vars`).
`spelling_vars`	character or integer. If `wordlists` is a data frame, then this column in defines the columns in `x` corresponding to each section of the `wordlists` data frame. This defaults to `3`, indicating the third column is to be used.
`sort_by`	a character the column to be used for sorting the values in each data frame. If the incoming variables are factors, this determines how the resulting factors will be sorted.
`protect`	a logical or numeric vector defining the columns to protect from any manipulation. Note: columns in `protect` will override any columns in either `force_Date` or `guess_dates`.
`classes`	a vector of class definitions for each of the columns. If this is not provided, the classes will be read from the columns themselves. Practically, this is used in `clean_data()` to mark columns as protected.
`warn_spelling`	if `TRUE`, errors and warnings from `clean_spelling()` will be aggregated and presented for each column that issues them. The default value is `FALSE`, which means that all errors and warnings will be ignored.

Author(s)

Zhian N. Kamvar

Examples


## make toy data
toy_data <- messy_data(20)

# location data with mis-spellings, French, and English.
messy_locations <- c("hopsital", "h\u00f4pital", "hospital", 
                     "m\u00e9dical", "clinic", 
                     "feild", "field")
toy_data$location <- sample(messy_locations, 20, replace = TRUE)

## show data
toy_data

# clean labels
clean_variables(toy_data) # by default, it's the same as clean_variable_lables

# add a wordlist
wordlist <- data.frame(
  from  = c("hopsital", "hopital",  "medical", "feild"),
  to    = c("hospital", "hospital", "clinic",  "field"),
  variable = rep("location", 4),
  stringsAsFactors = FALSE
)

clean_variables(toy_data, 
                wordlists     = wordlist,
                spelling_vars = "variable"
               )

reconhub/linelist documentation built on Jan. 1, 2023, 9:39 p.m.