Collapsing In crosswalkr: Rename and Encode Data Frames Using External Crosswalk Files

In some situations, you may want to use `encodefrom()` to collapse values, that is, group unique raw values into a smaller set of clean values / labels. For example, say you have the following data set, which gives each state's census division number and name:

Data

|id|state|cendiv|cendiv_name| |:-|:---:|:----:|:----------| |1|AL|6|East South Central| |2|AK|9|Pacific| |3|AZ|8|Mountain| |4|AR|7|West South Central| |5|CA|9|Pacific| |6|CO|8|Mountain| |7|CT|1|New England| |8|DE|5|South Atlantic| |10|FL|5|South Atlantic| |12|HI|9|Pacific| |14|IL|3|East North Central| |15|IN|3|East North Central| |16|IA|4|West North Central| |31|NJ|2|Middle Atlantic| |33|NY|2|Middle Atlantic|

Rather than using the nine census divisions, you would rather group states by their regions. You have the following crosswalk:

Crosswalk

|cendiv|cenreg|cenregnm| |:----:|:----:|:-------| |1|1|Northeast| |2|1|Northeast| |3|2|Midwest| |4|2|Midwest| |5|3|South| |6|3|South| |7|3|South| |8|4|West| |9|4|West|

As long as

1. `raw` values are unique in the crosswalk
2. `clean` and `label` columns have a 1:1 match

Then you can use `encodefrom()` to collapse categories as you move from raw to clean values.

```library(crosswalkr)
library(dplyr)
library(haven)
```
```## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
'IL','IN','IA','NJ','NY'),
cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
cendiv_name = c('East South Central','Pacific','Mountain',
'West South Central','Pacific','Mountain','New England',
'South Atlantic','South Atlantic','Pacific',
'East North Central','East North Central',
'West North Central','Middle Atlantic','Middle Atlantic'))

## crosswalk
cw <- tibble(cendiv = 1:9,
cenreg = c(1,1,2,2,3,3,3,4,4),
cenregnm = c('Northeast','Northeast','Midwest','Midwest',
'South','South','South','West','West'))
```
```## encode new column
df <- df %>%
mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
clean = cenreg, label = cenregnm))
```
```df
```

Try the crosswalkr package in your browser

Any scripts or data that you put into this service are public.

crosswalkr documentation built on Jan. 8, 2020, 5:07 p.m.