Probabilistic Patient Record Linkage
1 2 3 
data1 
either a binary ( 
data2 
either a binary ( 
dates1 
matrix or dataframe of dimension 
dates2 
matrix or dataframe of dimension 
eps_plus 
discrepancy rate between 
eps_minus 
discrepancy rate between 
aggreg_2ways 
a character string indicating how to merge the posterior two
probability matrices obtained for each of the 2 databases. Four possibility are
currently implemented: 
min_prev 
minimum prevalence for the variables used in matching. Default is 1%. 
data1_cont2diff 
either a matrix or dataframe of continuous features,
such as age, for which the similarity measure uses the difference with

data2_cont2diff 
either a matrix or dataframe of continuous features,
such as age, for which the similarity measure uses the difference with

d_max 
a numeric vector of length 
use_diff 
logical flag indicating whether continuous differentiable variables should be used in the 
Dates:
the use of dates1
and dates2
requires that at least one date interval matches across
dates1
and dates2
for claiming an agreement on a diagnosis code between data1
and data2
,
in addition of having that very same code recorded in both.
a matrix of size n1 x n2
with the posterior probability of matching for each n1*n2
pair
Hejblum BP, Weber G, Liao KP, Palmer N, Churchill S, Szolovits P, Murphy S, Kohane I, Cai T Probabilistic Record Linkage of DeIdentified Research Datasets Using Diagnosis Codes, submitted, 2017.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  set.seed(123)
ncodes < 500
npat < 200
incid < abs(rnorm(n=ncodes, 0.15, 0.07))
bin_codes < rbinom(n=npat*ncodes, size=1, prob=rep(incid, npat))
bin_codes_mat < matrix(bin_codes, ncol=ncodes, byrow = TRUE)
data1_ex < bin_codes_mat[1:(npat/2+npat/10),]
data2_ex < bin_codes_mat[c(1:(npat/10), (npat/2+npat/10 + 1):npat), ]
rownames(data1_ex) < paste0("ID", 1:(npat/2+npat/10), "_data1")
rownames(data2_ex) < paste0("ID", c(1:(npat/10), (npat/2+npat/10 + 1):npat), "_data2")
## Not run:
res < recordLink(data1 = data1_ex, data2 = data2_ex,
use_diff = FALSE, eps_minus = 0.01, eps_plus = 0.01)
round(res[c(1:3, 19:23), c(1:3, 19:23)], 3)
## End(Not run)

