Description Usage Arguments Details Value Simulation Anonymise Examples
rpin
is a generic function to generate non-personal pins for testing and educational purposes.
pin_anonymise
is a wrapper to anonymise/de-personalise existing pins.
1 2 3 4 5 6 7 8 9 |
x |
is either an integer (numeric vector of length one) specifing the length of the generated pin vector, or a pin vector itself to be used for generating similair but anonymised pins (see section "Anonymise"). |
... |
additional arguments to be passed to or from methods. |
l_birth,u_birth |
are dates (or objects that can be coerced to such) constituting a possible time intervall,
limiting the period from which birth dates are drawn.
If |
unique |
Should all generated pins be unique, i e should the sampling be done without replacement ( |
male_prob |
probability that a generated pin refers to a man ( |
keep_rel |
Should a possible relationship between pins in |
A pin, where the birth number (digit 9-11 in a 12 number pin) falls in the interval [880, 999], is a valid personal identification number but is never assigned to an actual person. Numbers of this form can instead be used for testing and educational procedures without the risk to intefer with personal (and possibly sensitive) data.
rpin
returns a vector of class pin
with length x
if x
is an integer or with length length(x)
if x
is itself a pin object. The object will also have an extra attribute "non_personal"
set to TRUE
to indicate
that the generated pins are non-personal ("fake").
The simulation is done by the following steps:
A birthdate is simulated as described in section Anonymise
or, if x
is an integer,
by a uniform distribution from [l_birth, u_birth]
.
The two first digits of the birth number is given by a discrete random sample from [88, 99]. Note that these numbers do not speify birthplace in this case (even if year of birth < 1990).
The last digit of the birth number is sampled from [0, 9] with probabilies according to male_prob
(that is either specified explicity or as described in section Anonymise
).
The control number is calculated from digit 1-11 by the Luhn Algorithm
(luhn_algo
).
Given that x
is an object of class pin
, the output of rpin
is a pin vector that tries to mimic x
in all aspects
except identifying real persons.
The empirical age (birthday) distribution from x
will be estimated by logspline
.
A random sample of length(x)
is drawn from that distribution. The last four digits are generated
as in section Simulation
but with sex distribution estimated from x
. The internal
relationships between elements in x
are maintaind as described for argument
keep_rel
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | library(sweidnumbr)
set.seed(12345)
## Generate some fake pins
p <- rpin(100)
## Most pin-functions can be applied to p
is.pin(p) # TRUE
pin_sex(p) # With mean(pin_sex(p) == "Male") -> male_prob when x -> Inf
table(pin_birthplace(p)) # non-informative
pin_age(p)
pin_to_date(p)
## If we want to simulate university students in a med course in Sweden,
## we migh try
p_ms <- rpin(100, l_birth = "1974-01-01", u_birth = "1994-01-01", male_prob = .25)
table(pin_sex(p_ms))
summary(pin_age(p_ms))
## Now, assume for a moment that p_ms is actually real data that we want to anonymise.
## The easy way:
p_ms2 <- rpin(p_ms)
## We then have new (fake) numbers but with the same age- and sex distribuiton.
table(pin_sex(p_ms2))
summary(pin_age(p_ms2))
## The empirical age distribution from p_ms itself could of course also generate
## birth dates outside of the empirical birthdate interval from p_ms. The default limit
## is to not generate pins with birth year before the birth year of the oldest pin in the input
## (and wice versa for the upper limit). But we could also chose to not tolerate any
## pins "older" than the "oldest" pin from the input
p_ms3 <- rpin(p_ms, l_birth = min(y <- pin_to_date(p_ms)), u_birth = max(y))
min(pin_to_date(p_ms3)) >= min(pin_to_date(p_ms))
max(pin_to_date(p_ms3)) <= max(pin_to_date(p_ms))
## We can modify the sex distribution even though we keep the age-distribution
x <- rpin(p_ms, male_prob = .01)
x <- pin_sex(x)
table(x)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.