Recode variables

Description

Recodes the categories / values of a variable x into new category values.

Usage

1
2
3
rec(x, recodes, as.fac = FALSE, var.label = NULL, val.labels = NULL)

rec(x, as.fac = FALSE, var.label = NULL, val.labels = NULL) <- value

Arguments

x

Numeric, charactor or factor variable that should be recoded; or a data.frame or list of variables.

recodes

String with recode pairs of old and new values. See 'Details' for examples. rec_pattern is a convenient function to create recode strings for grouping variables.

as.fac

Logical, if TRUE, recoded variable is returned as factor. Default is FALSE, thus a numeric variable is returned.

var.label

Optional string, to set variable label attribute for the returned variable (see set_label). If NULL (default), variable label attribute of x will be used (if present). If empty, variable label attributes will be removed.

val.labels

Optional character vector, to set value label attributes of recoded variable (see set_labels). If NULL (default), no value labels will be set.

value

See recodes.

Details

The recodes string has following syntax:

recode pairs

each recode pair has to be separated by a ;, e.g. recodes = "1=1; 2=4; 3=2; 4=3"

multiple values

multiple old values that should be recoded into a new single value may be separated with comma, e.g. "1,2=1; 3,4=2"

value range

a value range is indicated by a colon, e.g. "1:4=1; 5:8=2" (recodes all values from 1 to 4 into 1, and from 5 to 8 into 2)

"min" and "max"

minimum and maximum values are indicates by min (or lo) and max (or hi), e.g. "min:4=1; 5:max=2" (recodes all values from minimum values of x to 4 into 1, and from 5 to maximum values of x into 2)

"else"

all other values, which have not been specified yet, are indicated by else, e.g. "3=1; 1=2; else=3" (recodes 3 into 1, 1 into 2 and all other values into 3)

"copy"

the "else"-token can be combined with copy, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. "3=1; 1=2; else=copy" (recodes 3 into 1, 1 into 2 and all other values like 2, 4 or 5 etc. will not be recoded, but copied, see 'Examples')

NA's

NA values are allowed both as old and new value, e.g. "NA=1; 3:5=NA" (recodes all NA from old value into 1, and all old values from 3 to 5 into NA in the new variable)

"rev"

"rev" is a special token that reverses the value order (see 'Examples')

Value

A numeric variable (or a factor, if as.fac = TRUE or if x was a character vector) with recoded category values, or a data frame or list-object with recoded categories for all variables.

Note

Please note following behaviours of the function:

  • the "else"-token should always be the last argument in the recodes-string.

  • Non-matching values will be set to NA, unless captured by the "else"-token.

  • Tagged NA values (see tagged_na) and their value labels will be preserved when copying NA values to the recoded vector with "else=copy".

  • Variable label attributes (see, for instance, get_label) are preserved (unless changes via var.label-argument), however, value label attributes are removed (except for "rev", where present value labels will be automatically reversed as well). Use val.labels-argument to add labels for recoded values.

  • If x is a data.frame or list of variables, all variables should have the same categories resp. value range (else, see second bullet, NAs are produced).

See Also

set_na for setting NA values, replace_na to replace NA's with specific value, recode_to for re-shifting value ranges and ref_lvl to change the reference level of (numeric) factors.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
data(efc)
table(efc$e42dep, useNA = "always")

# replace NA with 5
table(rec(efc$e42dep, "1=1;2=2;3=3;4=4;NA=5"), useNA = "always")

# recode 1 to 2 into 1 and 3 to 4 into 2
table(rec(efc$e42dep, "1,2=1; 3,4=2"), useNA = "always")

# or:
# rec(efc$e42dep) <- "1,2=1; 3,4=2"
# table(efc$e42dep, useNA = "always")

# keep value labels. variable label is automatically preserved
library(dplyr)
efc %>%
  select(e42dep) %>%
  rec(recodes = "1,2=1; 3,4=2",
      val.labels = c("low dependency", "high dependency")) %>%
  str()

# recode 1 to 3 into 4 into 2
table(rec(efc$e42dep, "min:3=1; 4=2"), useNA = "always")

# recode 2 to 1 and all others into 2
table(rec(efc$e42dep, "2=1; else=2"), useNA = "always")

# reverse value order
table(rec(efc$e42dep, "rev"), useNA = "always")

# recode only selected values, copy remaining
table(efc$e15relat)
table(rec(efc$e15relat, "1,2,4=1; else=copy"))

# recode variables with same categorie in a data frame
head(efc[, 6:9])
head(rec(efc[, 6:9], "1=10;2=20;3=30;4=40"))

# recode list of variables. create dummy-list of
# variables with same value-range
dummy <- list(efc$c82cop1, efc$c83cop2, efc$c84cop3)
# show original distribution
lapply(dummy, table, useNA = "always")
# show recodes
lapply(rec(dummy, "1,2=1; NA=9; else=copy"), table, useNA = "always")

# recode character vector
dummy <- c("M", "F", "F", "X")
rec(dummy, "M=Male; F=Female; X=Refused")

# recode non-numeric factors
data(iris)
rec(iris$Species, "setosa=huhu; else=copy")

# preserve tagged NAs
library(haven)
x <- labelled(c(1:3, tagged_na("a", "c", "z"), 4:1),
              c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
                "Refused" = tagged_na("a"), "Not home" = tagged_na("z")))
# get current value labels
x
# recode 2 into 5; Values of tagged NAs are preserved
rec(x, "2=5;else=copy")
na_tag(rec(x, "2=5;else=copy"))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.