Description Usage Arguments Value References Examples
View source: R/compareRecords.R
Create comparison vectors for all pairs of records coming from two datafiles to be linked.
1 2 3 4 5 6 7 8 9 
df1, df2 
two datasets to be linked, of class 
flds 
a vector indicating the fields to be used in the linkage. Either a 
flds1, flds2 
vectors indicating the fields of 
types 
a vector of characters indicating the comparison type per comparison field. The options
are: 
breaks 
break points for the comparisons to obtain levels of disagreement.
It can be a list of length equal to the number of comparison fields, containing one numeric vector with the break
points for each comparison field, where entries corresponding to comparison type 
a list containing:
comparisons
matrix with n1*n2
rows, where the comparison pattern for record pair (i,j)
appears in row (j1)*n1+i
, for i in {1,…,n1}, and j in {1,…,n2}.
A comparison field with L+1 levels of disagreement,
is represented by L+1 columns of TRUE/FALSE indicators. Missing comparisons are coded as FALSE,
which is justified under an assumption of ignorability of the missing comparisons, see Sadinle (2017).
n1,n2
the datafile sizes, n1 = nrow(df1)
and n2 = nrow(df2)
.
nDisagLevs
a vector containing the number of levels of disagreement per comparison field.
compFields
a data frame containing the names of the fields in the datafiles used in the comparisons and the types of comparison.
Mauricio Sadinle (2017). Bayesian Estimation of Bipartite Matchings for Record Linkage. Journal of the American Statistical Association 112(518), 600612. [Published] [arXiv]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  data(twoFiles)
myCompData < compareRecords(df1, df2,
flds=c("gname", "fname", "age", "occup"),
types=c("lv","lv","bi","bi"),
breaks=c(0,.25,.5))
## same as
myCompData < compareRecords(df1, df2, types=c("lv","lv","bi","bi"))
## let's transform 'occup' to numeric to illustrate how to obtain numeric comparisons
df1$occup < as.numeric(df1$occup)
df2$occup < as.numeric(df2$occup)
## using different break points for 'lv' and 'nu' comparisons
myCompData1 < compareRecords(df1, df2,
flds=c("gname", "fname", "age", "occup"),
types=c("lv","lv","bi","nu"),
breaks=list(lv=c(0,.25,.5), nu=0:3))
## using different break points for each comparison field
myCompData2 < compareRecords(df1, df2,
flds=c("gname", "fname", "age", "occup"),
types=c("lv","lv","bi","nu"),
breaks=list(c(0,.25,.5), c(0,.2,.4,.6), NULL, 0:3))

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.