match_vec: Rename values in a vector based on a dictionary

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/match_vec.R

Description

This function provides an interface for forcats::fct_recode(), forcats::fct_explicit_na(), and forcats::fct_relevel() in such a way that a data dictionary can be imported from a data frame.

Usage

1
2
3
4
5
6
7
8
9
match_vec(
  x = character(),
  dictionary = data.frame(),
  from = 1,
  to = 2,
  quiet = FALSE,
  warn_default = TRUE,
  anchor_regex = TRUE
)

Arguments

x

a character or factor vector

dictionary

a matrix or data frame defining mis-spelled words or keys in one column (from) and replacement values (to) in another column. There are keywords that can be appended to the from column for addressing default values and missing data.

from

a column name or position defining words or keys to be replaced

to

a column name or position defining replacement values

quiet

a logical indicating if warnings should be issued if no replacement is made; if FALSE, these warnings will be disabled

warn_default

a logical. When a .default keyword is set and warn_default = TRUE, a warning will be issued listing the variables that were changed to the default value. This can be used to update your dictionary.

anchor_regex

a logical. When TRUE (default), any regex within the keywork

Details

Keys (from column)

The from column of the dictionary will contain the keys that you want to match in your current data set. These are expected to match exactly with the exception of three reserved keywords that start with a full stop:

Values (to column)

The values will replace their respective keys exactly as they are presented.

There is currently one recognised keyword that can be placed in the to column of your dictionary:

Value

a vector of the same type as x with mis-spelled labels cleaned. Note that factors will be arranged by the order presented in the data dictionary; other levels will appear afterwards.

Note

If there are any missing values in the from column (keys), then they are automatically converted to the character "NA" with a warning. If you want to target missing data with your dictionary, use the .missing keyword. The .regex keyword uses gsub() with the perl = TRUE option for replacement.

Author(s)

Zhian N. Kamvar

See Also

match_df() for an implementation that acts across multiple variables in a data frame.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
corrections <- data.frame(
  bad = c("foubar", "foobr", "fubar", "unknown", ".missing"),
  good = c("foobar", "foobar", "foobar", ".na", "missing"),
  stringsAsFactors = FALSE
)
corrections

# create some fake data
my_data <- c(letters[1:5], sample(corrections$bad[-5], 10, replace = TRUE))
my_data[sample(6:15, 2)] <- NA  # with missing elements

match_vec(my_data, corrections)

# You can use regular expressions to simplify your list
corrections <- data.frame(
  bad =  c(".regex f[ou][^m].+?r$", "unknown", ".missing"),
  good = c("foobar",                ".na",     "missing"),
  stringsAsFactors = FALSE
)

# You can also set a default value
corrections_with_default <- rbind(corrections, c(bad = ".default", good = "unknown"))
corrections_with_default

# a warning will be issued about the data that were converted
match_vec(my_data, corrections_with_default)

# use the warn_default = FALSE, if you are absolutely sure you don't want it.
match_vec(my_data, corrections_with_default, warn_default = FALSE)

# The function will give you a warning if the dictionary does not
# match the data
match_vec(letters, corrections)

# The can be used for translating survey output

words <- data.frame(
  option_code = c(".regex ^[yY][eE]?[sS]?",
    ".regex ^[nN][oO]?",
    ".regex ^[uU][nN]?[kK]?",
    ".missing"),
  option_name = c("Yes", "No", ".na", "Missing"),
  stringsAsFactors = FALSE
)
match_vec(c("Y", "Y", NA, "No", "U", "UNK", "N"), words)

matchmaker documentation built on Feb. 22, 2020, 1:11 a.m.