mask_counts_2: Perform threshold-based cell masking with primary and...

View source: R/mask_counts_2.R

mask_counts_2R Documentation

Perform threshold-based cell masking with primary and secondary masking (Algorithm 2 - A2)

Description

This function masks values in a numeric vector based on a specified threshold, using primary and secondary masking to ensure data privacy.

Usage

mask_counts_2(x, threshold = 11, zero_masking = FALSE)

Arguments

x

Numeric vector to mask.

threshold

Positive numeric value for the threshold below which cells are masked. Default is 11.

zero_masking

Logical; if TRUE, zeros may be masked as secondary cells if present. Default is FALSE.

Details

The function operates in two main steps:

  • Primary Masking: Values greater than 0 but less than the threshold are masked by replacing them with <threshold.

  • Secondary Masking: Applied when additional masking is required to prevent deduction of masked cells from totals. Secondary masking is triggered under the following conditions:

    • Condition A: A single primary masked cell exists, and there are other values that meet or exceed the threshold.

    • Condition B: Two or more counts of 1 are masked, with other values meeting or exceeding the threshold.

    • Condition C: The threshold is set to 11, with two or more counts of 10 masked and other counts meeting or exceeding the threshold.

If any of these conditions are met:

  • When zero_masking = TRUE and zeros are present, one zero is randomly selected and masked as <threshold.

  • When zero_masking = FALSE (or zeros are absent), the function masks the largest unmasked count (i.e., the maximum non-zero value).

Formula for Mask Value Calculation: To calculate the mask_value for the secondary cell, the following formula is used:

mask\_value = selected\_value - (threshold - totals\_of\_small\_cells)

In words, this formula subtracts the difference between the threshold and the sum of all small cells (those masked in the primary masking step) from the selected maximum unmasked value. This adjusted mask_value helps ensure privacy while retaining consistent totals.

Value

A character vector with masked cells, retaining NA as NA_character_.

Examples

x1 <- c(5, 11, 43, 55, 65, 121, 1213, 0, NA)

mask_counts_2(x1)

if (requireNamespace("dplyr", quietly = TRUE) && requireNamespace("tidyr", quietly = TRUE)) {
  data("countmaskr_data")
  countmaskr_data %>%
    dplyr::select(-c(id, age)) %>%
    tidyr::gather(block, Characteristics) %>%
    dplyr::group_by(block, Characteristics) %>%
    dplyr::summarise(N = dplyr::n()) %>%
    dplyr::ungroup() %>%
    dplyr::mutate(N_masked = mask_counts_2(N))
}

countmaskr documentation built on April 10, 2026, 5:07 p.m.