ck_cnt_measures: Utility measures for perturbed counts

View source: R/ck_cnt_measures.R

ck_cnt_measuresR Documentation

Utility measures for perturbed counts

Description

This function computes utility/information loss measures based on two numeric vectors (original and perturbed)

Usage

ck_cnt_measures(orig, pert, exclude_zeros = TRUE)

Arguments

orig

a numeric vector holding original values

pert

a numeric vector holding perturbed values

exclude_zeros

a scalar logical value; if TRUE (the default), all only cells with counts ⁠> 0⁠ are used when computing distances d1, d2 and d3. If this argument is FALSE, the complete vector is used.

Value

a list containing the following elements:

  • overview: a data.table with the following three columns:

    • noise: amount of noise computed as orig - pert

    • cnt: number of cells perturbed with the value given in column noise

    • pct: percentage of cells perturbed with the value given in column noise

  • measures: a data.table containing measures of the distribution of three different distances between original and perturbed values of the unweighted counts. Column what specifies the computed measure. The three distances considered are:

    • d1: absolute distance between original and masked values

    • d2: relative absolute distance between original and masked values

    • d3: absolute distance between square-roots of original and perturbed values

  • cumdistr_d1, cumdistr_d2 and cumdistr_d3: for each distance d1, d2 and d3, a data.table with the following three columns:

    • cat: a specific value (for d1) or interval (for distances d2 and d3)

    • cnt: number of records smaller or equal the value in column cat for the given distance

    • pct: proportion of records smaller or equal the value in column cat for the selected distance

  • false_zero: number of cells that were perturbed to zero

  • false_nonzero: number of cells that were initially zero but have been perturbed to a number different from zero

  • exclude_zeros: were empty cells exluded from computation or not

Examples

orig <- c(1:10, 0, 0)
pert <- orig; pert[c(1, 5, 7)] <- c(0, 6, 9)

# ignore empty cells when computing measures `d1`, `d2`, `d3`
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = TRUE)

# use all cells
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = FALSE)

# for an application on a perturbed object, see ?cellkey_pkg

sdcTools/cellKey documentation built on Dec. 5, 2023, 1:05 a.m.