sdcLog options
In sdcLog: Tools for Statistical Disclosure Control in Research Data Centers

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)

user_options <- options()

options(datatable.print.class = FALSE)
options(datatable.print.keys = FALSE)
options(datatable.print.trunc.cols = FALSE)

options(sdc.n_ids = 5L)
options(sdc.n_dominance = 2L)
options(sdc.share_dominance = 0.85)

You can set several options to

adapt sdcLog to the policies at your research data center
make sdcLog a little more convenient.

First, we create a tiny data.frame to demonstrate the effects of the options:

library(sdcLog)
df <- data.frame(id = LETTERS[1:3], v1 = 1L:3L, v2 = c(1L, 2L, 4L))
df

Options to adapt sdcLog to the policies at your research data center

sdc.n_ids

By default, sdcLog expects at least five different entities behind each calculated number. The functions in sdcLog derive this number from getOption("sdc.n_ids", default = 5). That is, if the option sdc.n_ids is not set, it defaults to 5. Consider the following example:

sdc_descriptives(data = df, id_var = "id", val_var = "v1")

This can be adapted to the policy of your research data center by setting the option sdc.n_ids to the desired value. For example, if your policy allows results to be released if there are at least three different entities behind each number, set

options(sdc.n_ids = 3)

Now, getOption("sdc.n_ids", default = 5) evaluates to 3 and warnings are thrown only if there are less than three entities behind each result. Note that this is reflected in the first line of output from every function of sdcLog:

sdc_descriptives(data = df, id_var = "id", val_var = "v1")

sdc.n_ids_dominance

The default value for sdc.n_ids_dominance is 2. In our example, this leads to a warning:

sdc_descriptives(data = df, id_var = "id", val_var = "v2")

If your policy requires only the largest entity alone to attribute for a share of less than 0.85, set

options(sdc.n_ids_dominance = 1)

Then, there is no problem in the example:

sdc_descriptives(data = df, id_var = "id", val_var = "v2")

sdc.share_dominance

The last option of sdcLog which affects internal calculations is sdc.share_dominance. To demonstrate, we first reset sdc.n_dominance to it's default value of 2.

options(sdc.n_ids_dominance = 2L)

Let's consider a policy which allows the largest two entities to attribute for a share of 0.8. To reflect this, set

options(sdc.share_dominance = 0.8)

Now, the initial example from sdc.n_ids throws a warning:

sdc_descriptives(data = df, id_var = "id", val_var = "v1")

sdc.info_level

This option differs from the previous ones in the sense that is does not affect actual calculations. Instead, it determines the verbosity of the output of sdcLog functions. Possible values are 0, 1 (default), and 2. Before demonstrating the effects of sdc.info_level, we reset sdc.share_dominance to it's default value of 0.85.

options(sdc.share_dominance = 0.85)

The example below shows the different levels of information printed to the console based on the different levels of sdc.info_level:

for (i in 0:2) {
  options(sdc.info_level = i)
  cat("\nsdc.info_level: ", getOption("sdc.info_level"), "\n")
  print(sdc_descriptives(data = df, id_var = "id", val_var = "v1"))
}

At level 0, only options and settings are printed. Level 1 also prints a short message about the overall outcome of the checks. Level 2 additionally prints the results of the separate checks on distinct entities and dominance.

Option to make sdcLog more convenient

Usually, the ID variable does not change during the course of your analysis. Therefore, it is convenient to set

options(sdc.id_var = "id")

Then you do not have to specify id_var every time you use one of the sdc_* functions:

options(user_options)
sdc_descriptives(data = df, val_var = "v1")

General remarks

Please note that these options affect all functions of sdcLog, not just sdc_descriptives().

Any scripts or data that you put into this service are public.

sdcLog documentation built on March 20, 2022, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdcLog
Tools for Statistical Disclosure Control in Research Data Centers

sdcLog options
In sdcLog: Tools for Statistical Disclosure Control in Research Data Centers

Options to adapt sdcLog to the policies at your research data center

sdc.n_ids

sdc.n_ids_dominance

sdc.share_dominance

sdc.info_level

Option to make sdcLog more convenient

General remarks

Try the sdcLog package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sdcLog Tools for Statistical Disclosure Control in Research Data Centers

sdcLog options In sdcLog: Tools for Statistical Disclosure Control in Research Data Centers

Options to adapt sdcLog to the policies at your research data center

sdc.n_ids

sdc.n_ids_dominance

sdc.share_dominance

sdc.info_level

Option to make sdcLog more convenient

General remarks

Try the sdcLog package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sdcLog
Tools for Statistical Disclosure Control in Research Data Centers

sdcLog options
In sdcLog: Tools for Statistical Disclosure Control in Research Data Centers