rec: Recode variables

Description

Recodes the categories / values of a variable x into new category values.

Usage

1
2
3
4
5
rec(x, recodes, as.fac = FALSE, var.label = NULL, val.labels = NULL,
  suffix = "_r")

rec(x, as.fac = FALSE, var.label = NULL, val.labels = NULL,
  suffix = "_r") <- value

Arguments

x

A variable, data frame or list-object.

recodes

String with recode pairs of old and new values. See 'Details' for examples. rec_pattern is a convenient function to create recode strings for grouping variables.

as.fac

Logical, if TRUE, recoded variable is returned as factor. Default is FALSE, thus a numeric variable is returned.

var.label

Optional string, to set variable label attribute for the returned variable (see set_label). If NULL (default), variable label attribute of x will be used (if present). If empty, variable label attributes will be removed.

val.labels

Optional character vector, to set value label attributes of recoded variable (see set_labels). If NULL (default), no value labels will be set. Value labels can also be directly defined in the recodes-syntax, see 'Details'.

suffix

String value, will be appended to variable (column) names of x, if x is a data frame. If x is not a data frame, this argument will be ignored. The default value to suffix column names in a data frame depends on the function call:

  • recoded variables (rec()) will be suffixed with "_r"

  • dichotomized variables (dicho()) will be suffixed with "_d"

  • grouped variables (split_var()) will be suffixed with "_g"

value

See recodes.

Details

The recodes string has following syntax:

recode pairs

each recode pair has to be separated by a ;, e.g. recodes = "1=1; 2=4; 3=2; 4=3"

multiple values

multiple old values that should be recoded into a new single value may be separated with comma, e.g. "1,2=1; 3,4=2"

value range

a value range is indicated by a colon, e.g. "1:4=1; 5:8=2" (recodes all values from 1 to 4 into 1, and from 5 to 8 into 2)

"min" and "max"

minimum and maximum values are indicates by min (or lo) and max (or hi), e.g. "min:4=1; 5:max=2" (recodes all values from minimum values of x to 4 into 1, and from 5 to maximum values of x into 2)

"else"

all other values, which have not been specified yet, are indicated by else, e.g. "3=1; 1=2; else=3" (recodes 3 into 1, 1 into 2 and all other values into 3)

"copy"

the "else"-token can be combined with copy, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. "3=1; 1=2; else=copy" (recodes 3 into 1, 1 into 2 and all other values like 2, 4 or 5 etc. will not be recoded, but copied, see 'Examples')

NA's

NA values are allowed both as old and new value, e.g. "NA=1; 3:5=NA" (recodes all NA into 1, and all values from 3 to 5 into NA in the new variable)

"rev"

"rev" is a special token that reverses the value order (see 'Examples')

direct value labelling

value labels for new values can be assigned inside the recode pattern by writing the value label in square brackets after defining the new value in a recode pair, e.g. "15:30=1 [young aged]; 31:55=2 [middle aged]; 56:max=3 [old aged]". See 'Examples'.

Value

A numeric variable (or a factor, if as.fac = TRUE or if x was a character vector) with recoded category values, or a data frame or list-object with recoded categories for all variables.

Note

Please note following behaviours of the function:

See Also

set_na for setting NA values, replace_na to replace NA's with specific value, recode_to for re-shifting value ranges and ref_lvl to change the reference level of (numeric) factors.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
data(efc)
table(efc$e42dep, useNA = "always")

# replace NA with 5
table(rec(efc$e42dep, recodes = "1=1;2=2;3=3;4=4;NA=5"), useNA = "always")

# recode 1 to 2 into 1 and 3 to 4 into 2
table(rec(efc$e42dep, recodes = "1,2=1; 3,4=2"), useNA = "always")

# or:
# rec(efc$e42dep) <- "1,2=1; 3,4=2"
# table(efc$e42dep, useNA = "always")

# keep value labels. variable label is automatically preserved
library(dplyr)
efc %>%
  select(e42dep) %>%
  rec(recodes = "1,2=1; 3,4=2",
      val.labels = c("low dependency", "high dependency")) %>%
  str()

# recode 1 to 3 into 4 into 2
table(rec(efc$e42dep, recodes = "min:3=1; 4=2"), useNA = "always")

# recode 2 to 1 and all others into 2
table(rec(efc$e42dep, recodes = "2=1; else=2"), useNA = "always")

# reverse value order
table(rec(efc$e42dep, recodes = "rev"), useNA = "always")

# recode only selected values, copy remaining
table(efc$e15relat)
table(rec(efc$e15relat, recodes = "1,2,4=1; else=copy"))

# recode variables with same category in a data frame
head(efc[, 6:9])
head(rec(efc[, 6:9], recodes = "1=10;2=20;3=30;4=40"))

# recode variable and set value labels via recode-syntax
dummy <- rec(efc$c160age,
             recodes = "15:30=1 [young]; 31:55=2 [middle]; 56:max=3 [old]")
frq(dummy)

# recode list of variables. create dummy-list of
# variables with same value-range
dummy <- list(efc$c82cop1, efc$c83cop2, efc$c84cop3)
# show original distribution
lapply(dummy, table, useNA = "always")
# show recodes
lapply(rec(dummy, recodes = "1,2=1; NA=9; else=copy"), table, useNA = "always")

# recode character vector
dummy <- c("M", "F", "F", "X")
rec(dummy, recodes = "M=Male; F=Female; X=Refused")

# recode non-numeric factors
data(iris)
rec(iris$Species, "setosa=huhu; else=copy")

# preserve tagged NAs
library(haven)
x <- labelled(c(1:3, tagged_na("a", "c", "z"), 4:1),
              c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
                "Refused" = tagged_na("a"), "Not home" = tagged_na("z")))
# get current value labels
x
# recode 2 into 5; Values of tagged NAs are preserved
rec(x, recodes = "2=5;else=copy")
na_tag(rec(x, recodes = "2=5;else=copy"))

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.