This note explains how to work with the Danish CPR numbers in R. The majority of the information contained here is based on this wonderful Wikipedia article.
I've made an R package to handle validation of Danish cpr numbers. It can be installed as follows:
# Requires the devtools package to be installed in order to work
devtools::install_github("ekstroem/DKcpr")
The first 6 digits of the CPR number represent date-of-birth in the
format DDMMYY. Since some people can live longer than 100 years the
date does not uniquely specify the year that a person was born. For
example, could the string 101010
represent October 10th 2010 or
October 10th, 1910.
The 7th digit of the CPR number determines the century but the cut is
not trivial. Consequently, the date_of_birth()
function returns the
date-of-birth as an R Date
object in the format YYYY-MM-DD
. As
input it accepts a vector of strings
For example:
library("DKcpr")
cpr <- c("1010104321", "1010978726", "2310450637", "1010978726")
date_of_birth(cpr)
## [1] "2010-10-10" "1897-10-10" "1945-10-23" "1897-10-10"
We get NA
if we enter CPR numbers that refer to illegal dates, or do not match the format, or contains text.
date_of_birth(c("3510104321", "2902191234", "1111111", "Curious George"))
## Warning: 4 failed to parse.
## [1] NA NA NA NA
Working with the exact dates is easily done with the lubridate
package. For example, to extract the year we can use the year()
function:
library("lubridate")
dob <- date_of_birth(cpr)
year(dob)
## [1] 2010 1897 1945 1897
The is_cpr()
function can determine whether a CPR number is a valid
CPR number. It returns TRUE
if it is a valid CPR number, FALSE
if
it is not (i.e., is a date but does not fulfill the modulo 11 check), and NA
if it is not a legal 10-digit number or date.
is_cpr(cpr)
## [1] TRUE FALSE TRUE FALSE
The gender()
function returns the gender of the individuals. 0 = female, 1 = male.
gender(cpr)
## [1] 1 0 1 0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.