Home

/

CRAN

/

multilink

/

no_dup_data_small: Small No Duplicate Dataset

no_dup_data_small: Small No Duplicate Dataset
In multilink: Multifile Record Linkage and Duplicate Detection

no_dup_data_small

R Documentation

Small No Duplicate Dataset

Description

A dataset containing 71 simulated records from 3 files with no duplicate records in each file, subset from no_dup_data.

Usage

no_dup_data_small

Format

A list with three elements:

records: A data.frame with the records, containing 7 fields, from all three files, in the format used for input to create_comparison_data.
file_sizes: The size of each file.
IDs: The true partition of the records, represented as an integer vector of arbitrary labels of length sum(file_sizes).

Source

Extracted from the datasets used in the simulation study of the paper. The datasets were generated using code from Peter Christen's group https://dmm.anu.edu.au/geco/index.php.

References

Serge Aleshin-Guendel & Mauricio Sadinle (2022). Multifile Partitioning for Record Linkage and Duplicate Detection. Journal of the American Statistical Association. [\Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1080/01621459.2021.2013242")}] [arXiv]

Examples

data(no_dup_data_small)

# There are 71 entities represented in the records
length(unique(no_dup_data_small$IDs))

multilink documentation built on July 9, 2023, 6:42 p.m.

multilink index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multilink
Multifile Record Linkage and Duplicate Detection

no_dup_data_small: Small No Duplicate Dataset
In multilink: Multifile Record Linkage and Duplicate Detection

Small No Duplicate Dataset

Description

Usage

Format

Source

References

Examples

Related to no_dup_data_small in multilink...

R Package Documentation

Browse R Packages

We want your feedback!

multilink Multifile Record Linkage and Duplicate Detection

no_dup_data_small: Small No Duplicate Dataset In multilink: Multifile Record Linkage and Duplicate Detection

Small No Duplicate Dataset

Description

Usage

Format

Source

References

Examples

Related to no_dup_data_small in multilink...

R Package Documentation

Browse R Packages

We want your feedback!

multilink
Multifile Record Linkage and Duplicate Detection

no_dup_data_small: Small No Duplicate Dataset
In multilink: Multifile Record Linkage and Duplicate Detection