RLdata: Test data for Record Linkage

Description Usage Format Author(s) Source

Description

These tables contain artificial personal data for the evaluation of Record Linkage procedures. Some records have been duplicated with randomly generated errors. RLdata500 contains 50 duplicates, RLdata10000 1000 duplicates.

Usage

1
2
3
4
5
6
7

Format

RLdata500 and RLdata10000 are data frames with 500 and 10000 records respectively, and 7 variables:

fname_c1

First name, first component

fname_c2

First name, second component

lname_c1

Last name, first component

lname_c2

Last name, second component

by

Year of birth

bm

Month of birth

bd

Day of birth

identity.RLdata500 and identity.RLdata10000 are vectors representing the true record ids of the two data sets. A pair of records are duplicates, if and only if their corresponding values in the identity vector agree.

An object of class data.frame with 500 rows and 7 columns.

An object of class numeric of length 500.

An object of class data.frame with 10000 rows and 7 columns.

An object of class numeric of length 10000.

Author(s)

Andreas Borg

Source

Generated with the data generation component of Febrl (Freely Extensible Biomedical Record Linkage), version 0.3 https://sourceforge.net/projects/febrl/.

The following data sources were used (all relate to Germany):

Web links as of October 2009.


cleanzr/dblinkR documentation built on June 13, 2021, 4:17 a.m.