View source: R/perturb_counts.R
| perturb_counts | R Documentation |
The perturb_counts function perturbs counts in a numeric vector containing small cells, specifically when only one primary cell is present and secondary cells need to be masked, following Algorithm 3 (A3). The function adjusts the counts by distributing noise to non-primary cells while preserving the overall distribution as much as possible.
perturb_counts(x, threshold = 10)
x |
Numeric vector of length N containing counts. |
threshold |
Numeric value specifying the threshold for small cells (primary cells). Defaults to 10. |
Perturbation Process Overview:
The function performs perturbation through the following steps:
Identification of Small Cells: Cells with counts greater than 0 and less than the specified threshold are identified as small cells (primary cells).
\text{Small Cells} = \{ i \mid 0 < x_i < \text{threshold} \}
Adjustment of Small Cells: The counts of small cells are set to the threshold value.
x'_i = \left\{
\begin{array}{ll}
\text{threshold} & \text{if } x_i \text{ is a small cell} \\
x_i & \text{otherwise}
\end{array}
\right.
Calculation of Total Noise: The total noise to be distributed is calculated as the difference between the original total sum and the adjusted sum.
\text{Total Noise} = \sum_{i=1}^{N} x_i - \sum_{i=1}^{N} x'_i
Distribution of Noise to Non-Small Cells: The total noise is proportionally distributed to the non-small cells based on their original counts.
Weights Calculation:
w_i = \frac{x_i}{\sum_{j \in \text{Non-Small Cells}} x_j}
Noise Allocation:
\text{Noise}_i = w_i \times \text{Total Noise}
Adjusted Counts:
x''_i = x'_i + \text{Noise}_i
Rounding Adjusted Counts: The adjusted counts are rounded to the nearest integer.
x'''_i = \text{round}(x''_i)
Adjustment for Rounding Discrepancies: Any remaining noise due to rounding discrepancies is adjusted by iteratively adding or subtracting 1 from the largest counts until the total counts are balanced, ensuring that no count falls below the threshold.
Verification of Proportions: The function checks if the proportions of the non-small cells remain consistent before and after perturbation. If the proportions differ, the function coerces to mask counts using the mask_counts() function.
Coercion to Mask Counts:
The function coerces to mask counts in the following scenarios:
Multiple Small Cells Detected: If more than one small cell is identified, perturbation may not be necessary unless intended to use. The function will still proceed with perturbation but recommends using threshold-based suppression.
Insufficient Available Counts: If the non-small cells do not have enough counts to absorb the total noise without any count falling below the threshold, the operation will lead to information loss.
Proportions Changed After Perturbation: If perturbation alters the original proportions of the non-small cells, the operation will lead to information loss.
#' - All Counts Below Threshold: If all counts in the vector are below the specified threshold, there is no meaningful perturbation possible. In this case, the function coerces to mask_counts() as a more secure alternative.
In these cases, the function calls mask_counts() to apply threshold-based cell suppression as a more secure alternative.
A character vector with perturbed counts formatted with digit precision and thousands separator. If perturbation is not feasible, the function returns counts masked using mask_counts().
# Example vectors
x1 <- c(5, 11, 43, 55, 65, 121, 1213, 0, NA)
x2 <- c(1, 1, 1, 55, 65, 121, 1213, 0, NA)
x3 <- c(11, 10, 10, 55, 65, 121, 1213, 0, NA)
# Apply the function
lapply(list(x1, x2, x3), perturb_counts)
# Using the function within a data frame
data("countmaskr_data")
aggregate_table <- countmaskr_data %>%
select(-c(id, age)) %>%
tidyr::gather(block, Characteristics) %>%
group_by(block, Characteristics) %>%
summarise(N = n()) %>%
ungroup()
aggregate_table %>%
group_by(block) %>%
mutate(N_masked = perturb_counts(N))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.