R/pkgname.R

#' Record linkage via scaling algorithm
#' 
#' \pkg{Scalelink} is an R command to perform 'probabilistic' linkage of two data files using a scaling procedure.
#'
#' With increasing availability of large data sets derived from administrative and other sources, there is an increasing
#' demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality
#' of identifiers used to carry out linkage means that existing approaches are often based upon 'probabilistic' models, which
#' are based on a number of assumptions, and can make heavy computational demands. This package implements the method proposed
#' in Goldstein, H., Harron, K. and Cortina-Borja, M. (2017). In this paper we suggest a new approach to classifying record
#' pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on 
#' training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal.
#'
#'
#' @examples
#' library(Scalelink)
#' 
#' ## Set the number of CPU cores to use (omit to use all available)
#' RcppParallel::setThreadOptions(numThreads = 2)
#' 
#' data(FOI, package = "Scalelink")
#' data(LDFCOMP, package = "Scalelink")
#'
#' idcols <- c("Day", "Month", "Year", "Sex")
#' result <- calcScores(FOI[, idcols], LDFCOMP[, idcols])
#'
#' print(result$scores)
#'
#' ## Scalelink package provides two examples using synthetic data
#' ## one with complete data and one containing missing values
#'
#' \dontrun{
#' ## For a list of demo titles
#' demo(package = 'Scalelink')
#'
#' ## To run a demo
#' demo(Example1)
#'
#' ## Using your own data
#' ##If you had the following files in your working directory:
#' ##FOI:
#' ##A space-delimited file of interest (NFOI x PFOI). NFOI is number of records
#' ##IDENTIFIERS_FOI:
#' ##A space-delimited file containing a row vector length PFOI with a 1 where it is an identifier
#' ##LDF:
#' ##A space-delimited linking data file (NLDF x PLDF). NLDF is number of records
#' ##IDENTIFIERS_LDF:
#' ##A space-delimited file containing a row vector length PLDF with a 1 where it is an identifier
#' 
#' ##Then you can calculate scores as follows:
#' FOI<-read.table("FOI")
#' LDF<-read.table("LDF")
#' IDENTIFIERS_FOI<-read.table('IDENTIFIERS_FOI')
#' IDENTIFIERS_LDF<-read.table('IDENTIFIERS_LDF')
#' result <- calcScores(FOI[, which(IDENTIFIERS_FOI == 1)], LDF[, which(IDENTIFIERS_LDF == 1)], 
#' missing.value=-9.999e+029)
#'
#' ##To view the scores:
#' print(round(result$scores, 2))
#'
#' ##To view the A* matrix:
#' print(result$astar)
#' }
#'
#' @section References:
#' 
#' \subsection{Scalelink}{
#' Goldstein, H., Charlton, C.M.J. (2017) Scalelink: A Package to link data via scaling.
#' }
#' 
#' \subsection{Paper}{
#' Goldstein, H., Harron, K. and Cortina-Borja, M. (2017). A scaling approach to record linkage. Statistics in Medicine.
#' DOI: 10.1002/sim.7287
#' }
#'
#' @section Maintainer:
#' Chris Charlton \email{c.charlton@@bristol.ac.uk}
#'
#' @author Charlton, C.M.J., Goldstein H (2017) Centre for Multilevel Modelling, University of Bristol.
#' 
#' @docType package
#' @name Scalelink
#' @importFrom Rcpp evalCpp
#' @importFrom RcppParallel RcppParallelLibs
#' @importFrom stats complete.cases
#' @importFrom utils citation
#' @useDynLib Scalelink
NULL
 

Try the Scalelink package in your browser

Any scripts or data that you put into this service are public.

Scalelink documentation built on May 2, 2019, 8:29 a.m.