sdc_min_max: Calculate RDC rule-compliant extreme values

View source: R/sdc_min_max.R

sdc_min_maxR Documentation

Calculate RDC rule-compliant extreme values

Description

Checks if calculation of extreme values comply to RDC rules. If so, function returns average min and max values according to RDC rules.

Usage

sdc_min_max(
  data,
  id_var = getOption("sdc.id_var"),
  val_var,
  by = NULL,
  max_obs = nrow(data),
  fill_id_var = FALSE
)

Arguments

data

data.frame from which the descriptive statistics are calculated.

id_var

character The name of the id variable. Defaults to getOption("sdc.id_var") so that you can provide options(sdc.id_var = "my_id_var") at the top of your script.

val_var

character vector of value variables on which descriptive statistics are computed.

by

character vector of grouping variables.

max_obs

integer The maximum number of observations used to calculate the minimum and maximum. Defaults to nrow(data). This is not the number of distinct entities.

fill_id_var

logical Only for very specific use cases. For example:

  • id_var contains NA values which represent missing values in the sense that there actually exist values identifying the entity but are unknown (or deleted for privacy reasons).

  • id_var contains NA values which result from the fact that an observation features more than one confidential identifier and not all of these identifiers are present in each observation. Examples for such identifiers are the role of a broker in a security transaction or the role of a collateral giver in a credit relationship.

If TRUE, NA values within id_var will internally be filled with <filled_[i]>, assuming that all NA values of id_var can be treated as different small entities for statistical disclosure control purposes. Thus, set TRUE only if this is a reasonable assumption.

Defaults to FALSE.

Value

A list list of class sdc_min_max with detailed information about options, settings and the calculated extreme values (if possible).

Examples

sdc_min_max(sdc_min_max_DT, id_var = "id", val_var = "val_1")
sdc_min_max(sdc_min_max_DT, id_var = "id", val_var = "val_2")
sdc_min_max(sdc_min_max_DT, id_var = "id", val_var = "val_3", max_obs = 10)
sdc_min_max(sdc_min_max_DT, id_var = "id", val_var = "val_1", by = "year")
sdc_min_max(
  sdc_min_max_DT, id_var = "id", val_var = "val_1", by = c("sector", "year")
)


sdcLog documentation built on March 20, 2022, 1:06 a.m.