run_weighted_count: Compute (weighted) counts or percentages from a list of data...

View source: R/weighted_count.R

run_weighted_countR Documentation

Compute (weighted) counts or percentages from a list of data frames

Description

This function calculates (weighted) category counts or percentages for a given categorical variable across a list of data frames (e.g., by country or year). Optionally, results can be grouped by another categorical variable.

Usage

run_weighted_count(
  data_list,
  var_name,
  wgt_name = NULL,
  na.rm = FALSE,
  by = NULL,
  percent = FALSE
)

Arguments

data_list

A named list of data frames, (e.g., across countries or years).

var_name

A string specifying the name of the categorical variable for which counts or percentages are to be computed. This must be listed in lissyrtools::lis_categorical_variables or lissyrtools::lws_wealth_categorical_variables.

wgt_name

(Optional) A string specifying the name of the weight variable to apply. If NULL, unweighted counts are used.

na.rm

Logical; if TRUE, observations with missing values in var_name are removed before computing counts or percentages.

by

(Optional) Optional string giving the name of a categorical variable to split the data within each data frame before computing statistics.

percent

Logical; if TRUE, the function returns weighted (or unweighted) percentages. If FALSE, it returns simple category counts.

Details

  • Any data frame where the by variable contains only NAs is dropped, with a warning.

Value

A named list.

  • If by is NULL: each list element is named by country and contains a named numeric vector, where the names are years and the values are counts or percentages.

  • If by is not NULL: each list element is named by ccyy (country-year) identifiers and contains a named numeric vector, where the names represent the by-categories (e.g., gender, region) and the values are the corresponding counts or percentages.

Examples

## Not run:  
library(lissyrtools)

data <- lissyrtools::lissyuse(data = c("de", "es", "uk"), vars = c("dhi", "region_c", "area_c", "educ", "emp"), from = 2016)


run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="educ", 
 by = "emp", 
 percent = FALSE, 
 na.rm = TRUE
)

# Specify `percent` = TRUE, to output percentages, unweighted or weighted.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 percent = TRUE, 
 na.rm = FALSE
)

# It is also possible to check the share of missings. 
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 percent = TRUE, 
 na.rm = TRUE
)  


# When `percent` = FALSE, and `wgt_name` is specified, it will be ignore and an unweighted count will be applied.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="region_c", 
 wgt_name = "hpopwgt",
 percent = FALSE,
 na.rm = TRUE
) 

#  Datasets where the variable in the `var_name` argument is only made of NA's will not be considered.
run_weighted_count(
 data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
 var_name ="area_c", 
 percent = FALSE,
 na.rm = TRUE
) 

# The same logic is applied with the `by` argument.
run_weighted_count(
data[names(data)[stringr::str_sub(names(data),3,4) == "18"]], 
"educ", 
na.rm = TRUE, 
by = "area_c"
)


## End(Not run)

JosepER/lissyrtools documentation built on June 12, 2025, 12:11 p.m.