knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) user_options <- options() options(datatable.print.class = FALSE) options(datatable.print.keys = FALSE) options(datatable.print.trunc.cols = FALSE) options(sdc.n_ids = 5L) options(sdc.n_dominance = 2L) options(sdc.share_dominance = 0.85)
You can set several options to
First, we create a tiny data.frame
to demonstrate the effects of the options:
library(sdcLog) df <- data.frame(id = LETTERS[1:3], v1 = 1L:3L, v2 = c(1L, 2L, 4L)) df
By default, sdcLog expects at least five different entities behind each
calculated number. The functions in sdcLog derive this number from
getOption("sdc.n_ids", default = 5)
. That is, if the option sdc.n_ids
is not
set, it defaults to 5
. Consider the following example:
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
This can be adapted to the policy of your research data center by setting the
option sdc.n_ids
to the desired value. For example, if your policy allows
results to be released if there are at least three different entities behind
each number, set
options(sdc.n_ids = 3)
Now, getOption("sdc.n_ids", default = 5)
evaluates to 3
and warnings are
thrown only if there are less than three entities behind each result. Note that
this is reflected in the first line of output from every function of sdcLog:
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
The default value for sdc.n_ids_dominance
is 2
. In our example, this leads
to a warning:
sdc_descriptives(data = df, id_var = "id", val_var = "v2")
If your policy requires only the largest entity alone to attribute for a share
of less than 0.85
, set
options(sdc.n_ids_dominance = 1)
Then, there is no problem in the example:
sdc_descriptives(data = df, id_var = "id", val_var = "v2")
The last option of sdcLog which affects internal calculations is
sdc.share_dominance
. To demonstrate, we first reset sdc.n_dominance
to it's
default value of 2
.
options(sdc.n_ids_dominance = 2L)
Let's consider a policy which allows the largest two entities to attribute for a
share of 0.8
. To reflect this, set
options(sdc.share_dominance = 0.8)
Now, the initial example from sdc.n_ids
throws a warning:
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
This option differs from the previous ones in the sense that is does not affect
actual calculations. Instead, it determines the verbosity of the output of
sdcLog functions. Possible values are 0
, 1
(default), and 2
. Before
demonstrating the effects of sdc.info_level
, we reset sdc.share_dominance
to
it's default value of 0.85
.
options(sdc.share_dominance = 0.85)
The example below shows the different levels of information printed to the
console based on the different levels of sdc.info_level
:
for (i in 0:2) { options(sdc.info_level = i) cat("\nsdc.info_level: ", getOption("sdc.info_level"), "\n") print(sdc_descriptives(data = df, id_var = "id", val_var = "v1")) }
At level 0
, only options and settings are printed. Level 1
also prints a
short message about the overall outcome of the checks. Level 2
additionally
prints the results of the separate checks on distinct entities and dominance.
Usually, the ID variable does not change during the course of your analysis. Therefore, it is convenient to set
options(sdc.id_var = "id")
Then you do not have to specify id_var
every time you use one of the sdc_*
functions:
options(user_options) sdc_descriptives(data = df, val_var = "v1")
Please note that these options affect all functions of sdcLog, not just
sdc_descriptives()
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.