cat2cat_agg: Manual mapping for an aggregated panel dataset

View source: R/cat2cat_agg.R

cat2cat_aggR Documentation

Manual mapping for an aggregated panel dataset

Description

Manual mapping of an inconsistently coded categorical variable according to the user provided mappings (equations).

Usage

cat2cat_agg(
  data = list(old = NULL, new = NULL, cat_var_old = NULL, cat_var_new = NULL, time_var =
    NULL, freq_var = NULL),
  ...
)

Arguments

data

list with 5 named fields 'old', 'new', 'cat_var', 'time_var', 'freq_var'.

...

mapping equations where direction is set with any of, '>', '<', '%>%', '%<%'.

Details

data argument - list with fields

"old"

data.frame older time point in the panel

"new"

data.frame more recent time point in the panel

"cat_var"

character - deprecated - name of the categorical variable

"cat_var_old"

character name of the categorical variable in the old period

"cat_var_new"

character name of the categorical variable in the new period

"time_var"

character name of time variable

"freq_var"

character name of frequency variable

Value

'named list' with 2 fields old and new - 2 data.frames. There will be added additional columns to each. The new columns are added instead of the additional metadata as we are working with new datasets where observations could be replicated. For the transparency the probability and number of replications are part of each observation in the 'data.frame'.

Note

All mapping equations have to be valid ones.

Examples

data("verticals", package = "cat2cat")
agg_old <- verticals[verticals$v_date == "2020-04-01", ]
agg_new <- verticals[verticals$v_date == "2020-05-01", ]

# cat2cat_agg - can map in both directions at once
# although usually we want to have the old or the new representation

agg <- cat2cat_agg(
  data = list(
    old = agg_old,
    new = agg_new,
    cat_var_old = "vertical",
    cat_var_new = "vertical",
    time_var = "v_date",
    freq_var = "counts"
  ),
  Automotive %<% c(Automotive1, Automotive2),
  c(Kids1, Kids2) %>% c(Kids),
  Home %>% c(Home, Supermarket)
)

## possible processing
library("dplyr")
agg %>%
  bind_rows() %>%
  group_by(v_date, vertical) %>%
  summarise(
    sales = sum(sales * prop_c2c),
    counts = sum(counts * prop_c2c),
    v_date = first(v_date)
  )

Polkas/catTOcat documentation built on Jan. 26, 2024, 7:10 a.m.