from_5_to_3: Function reducing from 5 to 3 categorical variables

View source: R/sp_from_5_to_3.R

from_5_to_3R Documentation

Function reducing from 5 to 3 categorical variables

Description

Function reducing from 5 to 3 categorical variables

Usage

from_5_to_3(
  dfs,
  dfs_name,
  totcode,
  hrcfiles = NULL,
  sep_dir = FALSE,
  hrc_dir = "hrc_alt",
  v1 = NULL,
  v2 = NULL,
  v3 = NULL,
  v4 = NULL,
  sep = "_",
  maximize_nb_tabs = FALSE,
  verbose = FALSE
)

Arguments

dfs

data.frame with 5 categorical variables (n >= 3 in the general case)

dfs_name

name of the data.frame in the list provided by the user

totcode

named vector of totals for categorical variables

hrcfiles

named vector indicating the hrc files of hierarchical variables among the categorical variables of dfs

sep_dir

allows forcing the writing of hrc files in a separate folder defaulted to FALSE

hrc_dir

folder where to write the hrc files if forcing the writing in a new folder or if no folder is specified in hrcfiles

v1

allows forcing the value of the first variable to merge when reducing from 5 to 4 dimensions, not specified by default (NULL)

v2

allows forcing the value of the second variable to merge when reducing from 5 to 4 dimensions, not specified by default (NULL)

v3

allows forcing the value of the first variable to merge when reducing from 4 to 3 dimensions, not specified by default (NULL)

v4

allows forcing the value of the second variable to merge when reducing from 4 to 3 dimensions, not specified by default (NULL)

sep

separator used during concatenation of variables

maximize_nb_tabs

specifies whether to prefer selecting hierarchical variables with the most nodes as a priority (TRUE), which generates more tables but of smaller size, or non-hierarchical variables with the least modality (FALSE) to create fewer tables

verbose

prints the different steps of the function to notify the user of the progress, mainly for the general function gen_tabs_5_4_to_3()

Value

a list containing the following components:

  • tabs: named list of dataframes with 3 dimensions (n-2 dimensions in the general case) endowed with nested hierarchies

  • hrcs5_4: named list of hrc specific to the variable created via the merge when reducing from 5 to 4 dimensions

  • hrcs4_3: named list of hrc specific to the variable created via the merge when reducing from 4 to 3 dimensions

  • alt_tot5_4: named list of totals when reducing from 5 to 4 dimensions

  • alt_tot4_3: named list of totals when reducing from 4 to 3 dimensions

  • vars: named list of vectors representing the merged variables during the two steps of dimension reduction

Examples

library(dplyr)
data <- expand.grid(
  ACT = c("Total", "A", "B", "A1", "A2", "B1", "B2"),
  GEO = c("Total", "GA", "GB", "GA1", "GA2", "GB1", "GB2"),
  SEX = c("Total", "F", "M","F1","F2","M1","M2"),
  AGE = c("Total", "AGE1", "AGE2", "AGE11", "AGE12", "AGE21", "AGE22"),
  ECO = c("PIB","Ménages","Entreprises"),
  stringsAsFactors = FALSE,
  KEEP.OUT.ATTRS = FALSE
) %>%
  as.data.frame()

data <- data %>% mutate(VALUE = 1:n())

hrc_act <- "hrc_ACT.hrc"
sdcHierarchies::hier_create(root = "Total", nodes = c("A","B")) %>%
  sdcHierarchies::hier_add(root = "A", nodes = c("A1","A2")) %>%
  sdcHierarchies::hier_convert(as = "argus") %>%
  slice(-1) %>%
  mutate(levels = substring(paste0(level,name),3)) %>%
  select(levels) %>%
  write.table(file = hrc_act, row.names = FALSE, col.names = FALSE, quote = FALSE)

hrc_geo <- "hrc_GEO.hrc"
sdcHierarchies::hier_create(root = "Total", nodes = c("GA","GB")) %>%
  sdcHierarchies::hier_add(root = "GA", nodes = c("GA1","GA2")) %>%
  sdcHierarchies::hier_add(root = "GB", nodes = c("GB1","GB2")) %>%
  sdcHierarchies::hier_convert(as = "argus") %>%
  slice(-1) %>%
  mutate(levels = substring(paste0(level,name),3)) %>%
  select(levels) %>%
  write.table(file = hrc_geo, row.names = FALSE, col.names = FALSE, quote = FALSE)

hrc_sex <- "hrc_SEX.hrc"
sdcHierarchies::hier_create(root = "Total", nodes = c("F","M")) %>%
  sdcHierarchies::hier_add(root = "F", nodes = c("F1","F2")) %>%
  sdcHierarchies::hier_add(root = "M", nodes = c("M1","M2")) %>%
  sdcHierarchies::hier_convert(as = "argus") %>%
  slice(-1) %>%
  mutate(levels = substring(paste0(level,name),3)) %>%
  select(levels) %>%
  write.table(file = hrc_sex, row.names = FALSE, col.names = FALSE, quote = FALSE)

# Results of the function
res1 <- from_5_to_3(
  dfs = data,
  dfs_name = "tab",
  totcode = c(SEX="Total",AGE="Total", GEO="Total", ACT="Total", ECO = "PIB"),
  hrcfiles = c(ACT = hrc_act, GEO = hrc_geo, SEX = hrc_sex),
  sep_dir = TRUE,
  hrc_dir = "output",
  v1 = "ACT",
  v2 = "AGE",
  v3 = "SEX",
  v4 = "ECO"
)

res2 <- from_5_to_3(
  dfs = data,
  dfs_name = "tab",
  totcode = c(SEX="Total",AGE="Total", GEO="Total", ACT="Total", ECO = "PIB"),
  hrcfiles = c(ACT = hrc_act, GEO = hrc_geo, SEX = hrc_sex),
  sep_dir = TRUE,
  hrc_dir = "output",
  verbose = TRUE
)

InseeFrLab/rtauargus documentation built on Feb. 25, 2025, 6:32 a.m.