Collapsing

In some situations, you may want to use encodefrom() to collapse values, that is, group unique raw values into a smaller set of clean values / labels. For example, say you have the following data set, which gives each state's census division number and name:

Data

|id|state|cendiv|cendiv_name| |:-|:---:|:----:|:----------| |1|AL|6|East South Central| |2|AK|9|Pacific| |3|AZ|8|Mountain| |4|AR|7|West South Central| |5|CA|9|Pacific| |6|CO|8|Mountain| |7|CT|1|New England| |8|DE|5|South Atlantic| |10|FL|5|South Atlantic| |12|HI|9|Pacific| |14|IL|3|East North Central| |15|IN|3|East North Central| |16|IA|4|West North Central| |31|NJ|2|Middle Atlantic| |33|NY|2|Middle Atlantic|

Rather than using the nine census divisions, you would rather group states by their regions. You have the following crosswalk:

Crosswalk

|cendiv|cenreg|cenregnm| |:----:|:----:|:-------| |1|1|Northeast| |2|1|Northeast| |3|2|Midwest| |4|2|Midwest| |5|3|South| |6|3|South| |7|3|South| |8|4|West| |9|4|West|

As long as

  1. raw values are unique in the crosswalk
  2. clean and label columns have a 1:1 match

Then you can use encodefrom() to collapse categories as you move from raw to clean values.

library(crosswalkr)
library(dplyr)
library(haven)
## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
             state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
                       'IL','IN','IA','NJ','NY'),
             cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
             cendiv_name = c('East South Central','Pacific','Mountain',
                             'West South Central','Pacific','Mountain','New England',
                             'South Atlantic','South Atlantic','Pacific',
                             'East North Central','East North Central',
                             'West North Central','Middle Atlantic','Middle Atlantic'))

## crosswalk
cw <- tibble(cendiv = 1:9,
             cenreg = c(1,1,2,2,3,3,3,4,4),
             cenregnm = c('Northeast','Northeast','Midwest','Midwest',
                          'South','South','South','West','West'))
## encode new column
df <- df %>%
    mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
                               clean = cenreg, label = cenregnm))
df


Try the crosswalkr package in your browser

Any scripts or data that you put into this service are public.

crosswalkr documentation built on Jan. 8, 2020, 5:07 p.m.