Description Usage Arguments Details Value Note Examples
View source: R/anonymization.R
Create random names and/or replace real names with random names to anonymize data.
1 | anonymize(real_names, lookup_file)
|
real_names |
A vector with unique names. If there are no known real names, provide vector of unique integers instead. |
lookup_file |
A character vector of length one as the file to write the lookup table to, or read the lookup table from. See details for more, notes for caveats. File should not be published to preserve the identity of participants. |
It is sometimes helpful in Q analyses to be able to refer to people-variables by a unique name, though real names often cannot be used in publications to protect participant's data.
This function looks up real_names
in the lookup_file
and returns the respective fake names.
The lookup_file
must always be a *.csv
-file with two columns of character vectors, named real_names
and fake_names
.
If the specified lookup_file
does not exist, new random fake_names
are sampled from randomNames
, and written to disc as the specified file.
Generated fake_names
are unique and can be used as R variable names.
If the lookup_file
does not include all real_names
, it is likewise appended with new fake_names
.
All entries must be unique and valid R variable names.
The rows in the lookup_file
can be in an arbitrary order, and can also include entries that are never used.
By providing a lookup_file
users (or participants) can choose their own fake_names
, though this may not protect personal data well.
In particular, storing socio-demographic data of participants as custom fake_names
(such as, for example "m_us_31"
) is not advised, because such data may be easily breached and downstream functions expect socio-demographic data in a different format.
A character vector of fake names, same length as real_names
.
Also writes a lookup table to disk at location lookup_file
, if it does not exist already.
Despite its name, this function does not magically anonymize data, but merely replaces names with randomly drawn fake names. It is your responsibility to protect your participants' data. If you are unsure, or do not understand the below caveats, do not rely on this function.
The lookup table with real and fake names must be stored in a safe place, ideally encrypted and not together with the raw data or results.
Your data may still be deanonymized if it includes other personal information and/or few participants.
Make sure no real names
are included in your command history, caches or other R objects and scripts.
1 2 3 4 5 6 7 | anonymize(real_names = c("Hillary", "Barack", "George"),
lookup_file = system.file("extdata",
"example_name_lookup.csv",
package = "pensieve"))
# system.file call only necessary for example, shipped with pensieve
# just as an example, never store lookup file with raw data
# see `notes` for details
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.