Linkage of administrative data sources is an efficient approach for conducting research on large populations, avoiding the time and cost of traditional data collection methods. Careful development of methods for linking data where unique identifiers are not available is key to avoiding bias resulting from linkage errors. However, the development and evaluation of new methods are limited by restricted access to identifier data for these purposes. Generating synthetic datasets of personal identifiers, which replicate the frequencies and errors of identifiers observed in administrative data, could facilitate the development of new methods.
We aimed to develop the sdglinkage package for generating synthetic dataset for linkage method development, with i) gold standard file with complete and accurate information and ii) linkage files that are corrupted as we often see in raw dataset.
The package has several main types of functions:
These functions can be organised as:
We also provide three vignettes to show how we can use the package:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.