rpin is a generic function to generate non-personal pins for testing and educational purposes.
pin_anonymise is a wrapper to anonymise/de-personalise existing pins.
1 2 3 4 5 6 7 8 9
is either an integer (numeric vector of length one) specifing the length of the generated pin vector, or a pin vector itself to be used for generating similair but anonymised pins (see section "Anonymise").
additional arguments to be passed to or from methods.
are dates (or objects that can be coerced to such) constituting a possible time intervall,
limiting the period from which birth dates are drawn.
Should all generated pins be unique, i e should the sampling be done without replacement (
probability that a generated pin refers to a man (
Should a possible relationship between pins in
A pin, where the birth number (digit 9-11 in a 12 number pin) falls in the interval [880, 999], is a valid personal identification number but is never assigned to an actual person. Numbers of this form can instead be used for testing and educational procedures without the risk to intefer with personal (and possibly sensitive) data.
rpin returns a vector of class
pin with length
x is an integer or with length
x is itself a pin object. The object will also have an extra attribute
"non_personal" set to
TRUE to indicate
that the generated pins are non-personal ("fake").
The simulation is done by the following steps:
A birthdate is simulated as described in section
Anonymise or, if
x is an integer,
by a uniform distribution from
The two first digits of the birth number is given by a discrete random sample from [88, 99]. Note that these numbers do not speify birthplace in this case (even if year of birth < 1990).
The last digit of the birth number is sampled from [0, 9] with probabilies according to
(that is either specified explicity or as described in section
The control number is calculated from digit 1-11 by the Luhn Algorithm
x is an object of class
pin, the output of
is a pin vector that tries to mimic
x in all aspects
except identifying real persons.
The empirical age (birthday) distribution from
x will be estimated by
A random sample of
length(x) is drawn from that distribution. The last four digits are generated
as in section
Simulation but with sex distribution estimated from
x. The internal
relationships between elements in
x are maintaind as described for argument
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
library(sweidnumbr) set.seed(12345) ## Generate some fake pins p <- rpin(100) ## Most pin-functions can be applied to p is.pin(p) # TRUE pin_sex(p) # With mean(pin_sex(p) == "Male") -> male_prob when x -> Inf table(pin_birthplace(p)) # non-informative pin_age(p) pin_to_date(p) ## If we want to simulate university students in a med course in Sweden, ## we migh try p_ms <- rpin(100, l_birth = "1974-01-01", u_birth = "1994-01-01", male_prob = .25) table(pin_sex(p_ms)) summary(pin_age(p_ms)) ## Now, assume for a moment that p_ms is actually real data that we want to anonymise. ## The easy way: p_ms2 <- rpin(p_ms) ## We then have new (fake) numbers but with the same age- and sex distribuiton. table(pin_sex(p_ms2)) summary(pin_age(p_ms2)) ## The empirical age distribution from p_ms itself could of course also generate ## birth dates outside of the empirical birthdate interval from p_ms. The default limit ## is to not generate pins with birth year before the birth year of the oldest pin in the input ## (and wice versa for the upper limit). But we could also chose to not tolerate any ## pins "older" than the "oldest" pin from the input p_ms3 <- rpin(p_ms, l_birth = min(y <- pin_to_date(p_ms)), u_birth = max(y)) min(pin_to_date(p_ms3)) >= min(pin_to_date(p_ms)) max(pin_to_date(p_ms3)) <= max(pin_to_date(p_ms)) ## We can modify the sex distribution even though we keep the age-distribution x <- rpin(p_ms, male_prob = .01) x <- pin_sex(x) table(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.