Background

-- Who are you? asked Mr Doe.

-- I'm a Hindu! Namrata from India replied.

-- I'm a statistician! said Günther from Germany.

People of different nationalities tend to identify themselves using different characteristics. In India, your identity might rely on your religion, while in other countries your profession might take its place. In Sweden, you might identify yourself with your almost-world-known (!?) personal identification number ("pin"). This 10 digit number is given to you almost immediately after birth and it often stays with you until your very last breath. The number is similar to a "social security number" but it has a much broader use and it is considered public. It is used in public registers (for education, work, tax payment, healthcare, car ownership etc) and it often serves as a membership number or customer id within companies and member unions. It is also essential for example in the public health and quality registers maintained in Sweden (and other Scandinavian countries) and used for reaserch.

Motivation

Naturally, the "pin" is used extensively to distinguish individuals in data sets analysed by R. The number also helps to match data from different sources and it can bring some demographic background data into the bargain, such as birth date (age), sex and geographic origin (depending on your birth year).

Up until now however, with the lack of a consistent R convention to handle "pins", the number might be treated as either a 10 or 12 digit numeric (with or without century prefix), a character (with hyphen or a '+'-sign to distinguish birth date from suffix numbers) or as a factor variable. But the pin is not a number (to add, subtract or logarithm pins is just nonsense) and it contains more information than captured by the individual characters in a string. Luckily, the new R package sweidnumbr (released on CRAN) is here for rescue!

Example

Let's look at some data (all pins are fake; they have a valid syntax but do not identify any real individuals):

library(sweidnumbr)
knitr::kable(tail(fake_pins,10))

So far, pin is just a standard character vector but let's change that to benefit from all of sweidnumbr's features:

pin <- as.pin(fake_pins$pin)
str(pin)

We can now also investigate some demographic characteristics almost on the fly (note that pins contained geographical information only up to 1989):

par(mfrow = c(1,2))
hist(pin_age(pin), 20, col = "lightgreen", main = "Age distribution")
pie(table(pin_sex(pin)), main = "Sex distribution")
pin_birthplace(pin[1:8])

Formats

as.pin can recognize pins in several different formats such as:

 as.pin(c("191212121212", "1212121212", "121212-1212", "121212+1212"))

It also checks that the numbers follow the correct pin syntax:

as.pin("181212121212") # Pins were introduced in 1946 and only for people not deceased before that
pin_ctrl("191212121211") # The last digit is a control number that is checked against preceeding digits
luhn_algo("191212121211") # The correct control number can be calculated by the Luhn algorithm

Organisational numbers

Not only individual has their personal identification number, so do companies and NGO:s. These features are covered by the oin group of functions in the package. Feel free to try them out ...

Keep in touch!

... and feel free to suggest enhancements and report bugs to https://github.com/rOpenGov/sweidnumbr/issues



rOpenGov/sweidnumbr documentation built on Jan. 19, 2024, noon