cat2cat_agg: Manual mapping for an aggregated panel dataset

View source: R/cat2cat_agg.R

cat2cat_aggR Documentation

Manual mapping for an aggregated panel dataset


Manual mapping of an inconsistently coded categorical variable according to the user provided mappings (equations).


  data = list(old = NULL, new = NULL, cat_var_old = NULL, cat_var_new = NULL, time_var =
    NULL, freq_var = NULL),



list with 5 named fields 'old', 'new', 'cat_var', 'time_var', 'freq_var'.


mapping equations where direction is set with any of, '>', '<', '%>%', '%<%'.


data argument - list with fields

  • "old" data.frame older time point in the panel

  • "new" data.frame more recent time point in the panel

  • "cat_var" character - deprecated - name of the categorical variable

  • "cat_var_old" character name of the categorical variable in the old period

  • "cat_var_new" character name of the categorical variable in the new period

  • "time_var" character name of time variable

  • "freq_var" character name of frequency variable


'named list' with 2 fields old and new - 2 data.frames. There will be added additional columns to each. The new columns are added instead of the additional metadata as we are working with new datasets where observations could be replicated. For the transparency the probability and number of replications are part of each observation in the 'data.frame'.


All mapping equations have to be valid ones.


data("verticals", package = "cat2cat")
agg_old <- verticals[verticals$v_date == "2020-04-01", ]
agg_new <- verticals[verticals$v_date == "2020-05-01", ]

# cat2cat_agg - can map in both directions at once
# although usually we want to have the old or the new representation

agg <- cat2cat_agg(
  data = list(
    old = agg_old,
    new = agg_new,
    cat_var_old = "vertical",
    cat_var_new = "vertical",
    time_var = "v_date",
    freq_var = "counts"
  Automotive %<% c(Automotive1, Automotive2),
  c(Kids1, Kids2) %>% c(Kids),
  Home %>% c(Home, Supermarket)

## possible processing
agg %>%
  bind_rows() %>%
  group_by(v_date, vertical) %>%
    sales = sum(sales * prop_c2c),
    counts = sum(counts * prop_c2c),
    v_date = first(v_date)

Polkas/catTOcat documentation built on Feb. 8, 2023, 3:21 p.m.