run_weighted_count: Compute (weighted) counts or percentages from a list of data...
In JosepER/lissyrtools: Tools for LISSY Jobs

run_weighted_count

R Documentation

Compute (weighted) counts or percentages from a list of data frames

Description

This function calculates (weighted) category counts or percentages for a given categorical variable across a list of data frames (e.g., by country or year). Optionally, results can be grouped by another categorical variable.

Usage

run_weighted_count(
  data_list,
  var_name,
  wgt_name = NULL,
  na.rm = FALSE,
  by = NULL,
  percent = FALSE
)

Arguments

`data_list`	A named list of data frames, (e.g., across countries or years).
`var_name`	A string specifying the name of the categorical variable for which counts or percentages are to be computed. This must be listed in `lissyrtools::lis_categorical_variables` or `lissyrtools::lws_wealth_categorical_variables`.
`wgt_name`	(Optional) A string specifying the name of the weight variable to apply. If `NULL`, unweighted counts are used.
`na.rm`	Logical; if `TRUE`, observations with missing values in `var_name` are removed before computing counts or percentages.
`by`	(Optional) Optional string giving the name of a categorical variable to split the data within each data frame before computing statistics.
`percent`	Logical; if `TRUE`, the function returns weighted (or unweighted) percentages. If `FALSE`, it returns simple category counts.

Details

Any data frame where the by variable contains only NAs is dropped, with a warning.

Value

A named list.

If by is NULL: each list element is named by country and contains a named numeric vector, where the names are years and the values are counts or percentages.
If by is not NULL: each list element is named by ccyy (country-year) identifiers and contains a named numeric vector, where the names represent the by-categories (e.g., gender, region) and the values are the corresponding counts or percentages.

Examples

## Not run:  
library(lissyrtools)

data <- lissyrtools::lissyuse(data = c("de", "es", "uk"), vars = c("dhi", "region_c", "area_c", "educ", "emp"), from = 2016)


run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="educ", 
 by = "emp", 
 percent = FALSE, 
 na.rm = TRUE
)

# Specify `percent` = TRUE, to output percentages, unweighted or weighted.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 percent = TRUE, 
 na.rm = FALSE
)

# It is also possible to check the share of missings. 
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 percent = TRUE, 
 na.rm = TRUE
)  


# When `percent` = FALSE, and `wgt_name` is specified, it will be ignore and an unweighted count will be applied.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 wgt_name = "hpopwgt",
 percent = FALSE,
 na.rm = TRUE
) 

#  Datasets where the variable in the `var_name` argument is only made of NA's will not be considered.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="area_c", 
 percent = FALSE,
 na.rm = TRUE
) 

# The same logic is applied with the `by` argument.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
"educ", 
na.rm = TRUE, 
by = "area_c"
)


## End(Not run)

JosepER/lissyrtools documentation built on June 12, 2025, 12:11 p.m.