getPairs-methods: Extract Record Pairs

Description Usage Arguments Details Value Note Author(s) Examples

Description

Extracts record pairs from data and result objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## S4 method for signature 'RecLinkData'
getPairs(object, max.weight = Inf, min.weight = -Inf,
         single.rows = FALSE, show = "all", sort = !is.null(object$Wdata))

## S4 method for signature 'RLBigData'
getPairs(object, max.weight = Inf, min.weight = -Inf,
    filter.match = c("match", "unknown", "nonmatch"),
    withWeight = hasWeights(object), withMatch = TRUE, single.rows = FALSE,
    sort = withWeight)

## S4 method for signature 'RLResult'
getPairs(object, filter.match = c("match", "unknown", "nonmatch"),
    filter.link = c("nonlink", "possible", "link"), max.weight = Inf, 
    min.weight = -Inf, withMatch = TRUE, withClass = TRUE, 
    withWeight = hasWeights(object@data), single.rows = FALSE, sort = withWeight)

getFalsePos(object, single.rows = FALSE)
getFalseNeg(object, single.rows = FALSE)
getFalse(object, single.rows = FALSE)

Arguments

object

The data or result object from which to extract record pairs.

max.weight, min.weight

Real numbers. Upper and lower weight threshold.

filter.match

Character vector, a nonempty subset of c("match", "nonmatch", "unkown") denoting which pairs to allow in the output.

filter.link

Character vector, a nonempty subset of c("link", "nonlink", "unkown") denoting which pairs to allow in the output.

withWeight

Logical. Whether to include linkage weights in the output.

withMatch

Logical. Whether to include matching status in the output.

withClass

Logical. Whether to include classification result in the output.

single.rows

Logical. Whether to print record pairs in one row instead of two consecutive rows.

show

Character. Selects which records to show, one of "links", "nonlinks", "possible", "all".

sort

Logical. Whether to sort descending by weight.

Details

These methods extract record pairs from "RecLinkData", or "RecLinkResult", "RLBigData" and "RLResult" objects. Possible applications are retrieving a linkage result for further processing, conducting a manual review in order to determine classification thresholds or inspecting misclassified pairs.

The various arguments can be grouped by the following purposes:

  1. Controlling which record pairs are included in the output: min.weight and max.weight, filter.match, filter.link, show.

  2. Controlling which information is shown: withWeight, withMatch, withClass

  3. Controlling the overall structure of the result: sort, single.rows.

The weight limits are inclusive, i.e. a record pair with weight w is included only if
w >= min.weight && w <= max.weight.

If single.rows is not TRUE, pairs are output on two consecutive lines in a more readable format. All data are converted to character, which can lead to a loss of precision for numeric values. Therefore, this format should be used for printing only.

getFalsePos, getFalseNeg and getFalse are shortcuts (currently for objects of class "RLResult" only) to retrieve false positives (links that are non-matches in fact), false negatives (non-links that are matches in fact) or all falsely classified pairs, respectively.

Value

A data frame. If single.rows is TRUE, each row holds (in this order) id and data fields of the first record, id and data fields of the second record and possibly matching status, classification result and/or weight.

If single.rows is not TRUE, the result holds for each resulting record pair consecutive rows of the following format:

  1. ID and data fields of the first record followed by as many empty fields to match the length of the following line.

  2. ID and data fields of the second record, possibly followed by matching status, classification result and/or weight.

  3. A blank line to separate record pairs.

Note

When non-matches are included in the output and blocking is permissive, the result object can be very large, possibly leading to memory problems.

Author(s)

Andreas Borg, Murat Sariyar

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(RLdata500)

# create record pairs and calculate epilink weights
rpairs <- RLBigDataDedup(RLdata500, identity = identity.RLdata500,
  blockfld=list(1,3,5,6,7))
rpairs <- epiWeights(rpairs)

# show all record pairs with weights between 0.5 and 0.6
getPairs(rpairs, min.weight=0.5, max.weight=0.6)

# show only matches with weight <= 0.5
getPairs(rpairs, max.weight=0.5, filter.match="match")

# classify with one threshold
result <- epiClassify(rpairs, 0.5)

# show all links, do not show classification in the output
getPairs(result, filter.link="link", withClass = FALSE)

# see wrongly classified pairs
getFalsePos(result)
getFalseNeg(result)

Example output

Loading required package: DBI
Loading required package: RSQLite
Loading required package: ff
Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit

Attaching package: 'bit'

The following object is masked from 'package:base':

    xor

Attaching package ff
- getOption("fftempdir")=="/work/tmp/tmp/Rtmp8tyFj7"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package: 'ff'

The following objects are masked from 'package:bit':

    clone, clone.default, clone.list

The following objects are masked from 'package:utils':

    write.csv, write.csv2

The following objects are masked from 'package:base':

    is.factor, is.ordered

Loading required package: ffbase

Attaching package: 'ffbase'

The following objects are masked from 'package:ff':

    [.ff, [.ffdf, [<-.ff, [<-.ffdf

The following objects are masked from 'package:base':

    %in%, table

RecordLinkage library
[c] IMBEI Mainz


Attaching package: 'RecordLinkage'

The following object is masked from 'package:ff':

    clone

The following object is masked from 'package:bit':

    clone


Warning messages:
1: RSQLite::make.db.names() is deprecated, please switch to DBI::dbQuoteIdentifier(). 
2: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
3: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
4: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
5: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
6: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
7: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
8: In result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
================================================================================
    id  fname_c1 fname_c2  lname_c1 lname_c2   by bm bd is_match    Weight
1   48    WERNER     <NA>   KOERTIG     <NA> 1965 11 28                   
2  238   WERNIER     <NA>   KOERTIG     <NA> 1965 11 28     TRUE 0.5924569
3                                                                         
4   68    PETEVR     <NA>     FUCHS     <NA> 1972  9 12                   
5  190     PETER     <NA>     FUCHS     <NA> 1972  9 12     TRUE 0.5924569
6                                                                         
7   85 THORSKTEN     <NA>    MARTIN     <NA> 1995 11 15                   
8  187  THORSTEN     <NA>    MARTIN     <NA> 1995 11 15     TRUE 0.5924569
9                                                                         
10 158     PETER     <NA>    BECKER     <NA> 1960  9  5                   
11 229    PETERS     <NA>    BECKER     <NA> 1960  9  5     TRUE 0.5924569
12                                                                        
13 177 JOHANNNES     <NA>    SCHULZ     <NA> 1974  1 17                   
14 207  JOHANNES     <NA>    SCHULZ     <NA> 1974  1 17     TRUE 0.5924569
15                                                                        
16 209     ROLBF     <NA>   NEUMANN     <NA> 1967  3 29                   
17 227      ROLF     <NA>   NEUMANN     <NA> 1967  3 29     TRUE 0.5924569
18                                                                        
19 265 MARIANNFE     <NA>   MOELLER     <NA> 1961  9 17                   
20 456  MARIANNE     <NA>   MOELLER     <NA> 1961  9 17     TRUE 0.5924569
21                                                                        
22 266     KARIN     <NA>      HORN     <NA> 2002  6  4                   
23 437    KARINW     <NA>      HORN     <NA> 2002  6  4     TRUE 0.5924569
24                                                                        
25 298     SONJA     <NA>   FISCHER     <NA> 1989  7 17                   
26 464    SONJAD     <NA>   FISCHER     <NA> 1989  7 17     TRUE 0.5924569
27                                                                        
28 310    MONIKA     <NA> SCHNEIDER     <NA> 1937  6  2                   
29 432   MONIYKA     <NA> SCHNEIDER     <NA> 1937  6  2     TRUE 0.5924569
30                                                                        
31 377   SABAINE     <NA>      OTTO     <NA> 1940  7 23                   
32 448    SABINE     <NA>      OTTO     <NA> 1940  7 23     TRUE 0.5924569
33                                                                        
34 391  GABRIELE     <NA>    BECKER     <NA> 1990  3 27                   
35 496 GABRIHELE     <NA>    BECKER     <NA> 1990  3 27     TRUE 0.5924569
36                                                                        
37 395   GISOELA     <NA>      BECK     <NA> 2003  4 16                   
38 404    GISELA     <NA>      BECK     <NA> 2003  4 16     TRUE 0.5924569
39                                                                        
40 402   CHRISTA     <NA>   SCHWARZ     <NA> 1965  7 13                   
41 462  CHRISTAH     <NA>   SCHWARZ     <NA> 1965  7 13     TRUE 0.5924569
42                                                                        
43 388    ANDREA     <NA>     WEBER     <NA> 1945  5 20                   
44 408    ANDREA     <NA>   SCHMIDT     <NA> 1945  2 20    FALSE 0.5067013
45                                                                        
================================================================================
    id fname_c1 fname_c2 lname_c1 lname_c2   by bm bd is_match    Weight
1  353     INGE     <NA>   SEIDEL     <NA> 1949  9  4                   
2  355    INGEU     <NA>   SEIDEL     <NA> 1949  8  4     TRUE 0.4948059
3                                                                       
4  285    ERIKA     <NA>    WEBER     <NA> 1995  2  1                   
5  379    ERIKA     <NA>    WEBER     <NA> 1992  2 29     TRUE 0.4782410
6                                                                       
7  127     KARL     <NA>    KLEIN     <NA> 2002  6 20                   
8  142     KARL     <NA>   KLEIBN     <NA> 2002  6 29     TRUE 0.4692532
9                                                                       
10  37 HARTMHUT     <NA> HOFFMSNN     <NA> 1929 12 29                   
11  72  HARTMUT     <NA> HOFFMANN     <NA> 1929 12 29     TRUE 0.4081096
12                                                                      
================================================================================
     id  fname_c1 fname_c2  lname_c1 lname_c2   by bm bd is_match    Weight
1   290     HELGA ELFRIEDE    BERGER     <NA> 1989  1 18                   
2   466     HELGA ELFRIEDE    BERGER     <NA> 1989  1 28     TRUE 0.7786012
3                                                                          
4   313    URSULA   BIRGIT   MUELLRR     <NA> 1940  6 15                   
5   457    URSULA   BIRGIT   MUELLER     <NA> 1940  6 15     TRUE 0.7293529
6                                                                          
7   467    ULRIKE   NICOLE    BECKRR     <NA> 1982  8  4                   
8   472    ULRIKE   NICOLE    BECKER     <NA> 1982  8  4     TRUE 0.7293529
9                                                                          
10   25  MATTHIAS     <NA>      HAAS     <NA> 1955  7  8                   
11  107  MATTHIAS     <NA>      HAAS     <NA> 1955  8  8     TRUE 0.6910486
12                                                                         
13  106     ANDRE     <NA>   MUELLER     <NA> 1976  2 25                   
14  175     ANDRE     <NA>   MUELLER     <NA> 1976  1 25     TRUE 0.6910486
15                                                                         
16  370    MONIKA     <NA>   MUELLER     <NA> 2000  8 26                   
17  478    MONIKA     <NA>   MUELLER     <NA> 2000  5 26     TRUE 0.6910486
18                                                                         
19   50    STEFAN     <NA>   MUELLER     <NA> 1957  6  7                   
20  234    STEFAN     <NA>   MUELLER     <NA> 1957  6  1     TRUE 0.6536006
21                                                                         
22   87      HANS     <NA>   SCHULZE     <NA> 1972 11 27                   
23  117      HANS     <NA>   SCHULZE     <NA> 1972 11 28     TRUE 0.6536006
24                                                                         
25  145    HARALD     <NA>     WEBER     <NA> 1977  6  1                   
26  240    HARALD     <NA>     WEBER     <NA> 1977  6  2     TRUE 0.6536006
27                                                                         
28  286     MARIA     <NA> SCHROEDER     <NA> 1955  1 11                   
29  383     MARIA     <NA> SCHROEDER     <NA> 1955  1 12     TRUE 0.6536006
30                                                                         
31  289 CHRISTINE     <NA>    PETERS     <NA> 1993  2  5                   
32  399 CHRISTINE     <NA>    PETERS     <NA> 1993  2  6     TRUE 0.6536006
33                                                                         
34  297    ANDREA     <NA>     WEBER     <NA> 1945  5 29                   
35  388    ANDREA     <NA>     WEBER     <NA> 1945  5 20     TRUE 0.6536006
36                                                                         
37  357   GERTRUD     <NA>   BAUMANN     <NA> 1926  5 26                   
38  414   GERTRUD     <NA>   BAUMANN     <NA> 1926  5 20     TRUE 0.6536006
39                                                                         
40   71 CHRISTIAN     <NA>     GROSS     <NA> 1959  4  7                   
41  205 CHRISTIAN     <NA>     GROSS     <NA> 2008  4  7     TRUE 0.6133400
42                                                                         
43   78    STEFAN     <NA>     BRAUN     <NA> 1997 12 30                   
44  133    STEFAN     <NA>     BRAUN     <NA> 1947 12 30     TRUE 0.6133400
45                                                                         
46  108   GERHARD     <NA> FRIEDRICH     <NA> 1987  2 10                   
47  203   GERHARD     <NA> FRIEDRICH     <NA> 1957  2 10     TRUE 0.6133400
48                                                                         
49  217     HORST     <NA>     MEIER     <NA> 1977  6  6                   
50  248     HORST     <NA>     MEIER     <NA> 1972  6  6     TRUE 0.6133400
51                                                                         
52  258    SABINE     <NA>  HARTMANN     <NA> 1939  6 16                   
53  450    SABINE     <NA>  HARTMANN     <NA> 1943  6 16     TRUE 0.6133400
54                                                                         
55  267    ANGELA     <NA>     STEIN     <NA> 2002  8 30                   
56  385    ANGELA     <NA>     STEIN     <NA> 2062  8 30     TRUE 0.6133400
57                                                                         
58  481   SUSANNE     <NA>     KLEIN     <NA> 1969  3 15                   
59  490   SUSANNE     <NA>     KLEIN     <NA> 1960  3 15     TRUE 0.6133400
60                                                                         
61    2      GERD     <NA>     BAUER     <NA> 1968  7 27                   
62   43      GERD     <NA>    BAUERH     <NA> 1968  7 27     TRUE 0.6043523
63                                                                         
64   34     HEINZ     <NA>     BOEHM     <NA> 1938 12 20                   
65  111     HEINZ     <NA>    BOEHMR     <NA> 1938 12 20     TRUE 0.6043523
66                                                                         
67   58     FRANK     <NA>   MUELLDR     <NA> 1978  5 20                   
68  148     FRANK     <NA>   MUELLER     <NA> 1978  5 20     TRUE 0.6043523
69                                                                         
70  112   GERHARD     <NA>     ERNSR     <NA> 1980 12 16                   
71  116   GERHARD     <NA>     ERNST     <NA> 1980 12 16     TRUE 0.6043523
72                                                                         
73  120     FRANK     <NA>  BERGMANN     <NA> 1998 11  8                   
74  165     FRANK     <NA>  BERGKANN     <NA> 1998 11  8     TRUE 0.6043523
75                                                                         
76  125 CHRISTIAN     <NA>  MUELLEPR     <NA> 1974  8  9                   
77  193 CHRISTIAN     <NA>   MUELLER     <NA> 1974  8  9     TRUE 0.6043523
78                                                                         
79  130   MICHAEL     <NA>     MEYER     <NA> 1988  1 31                   
80  147   MICHAEL     <NA>      MYER     <NA> 1988  1 31     TRUE 0.6043523
81                                                                         
82  192    DIETER     <NA> SCHNEIDER     <NA> 1968  8 21                   
83  216    DIETER     <NA> SCHNNIDER     <NA> 1968  8 21     TRUE 0.6043523
84                                                                         
85  283      ANNA     <NA>     LANGE     <NA> 1998  3 29                   
86  322      ANNA     <NA>     LANGK     <NA> 1998  3 29     TRUE 0.6043523
87                                                                         
88  314    RENATE     <NA>    SCHUTE     <NA> 1940 12 29                   
89  407    RENATE     <NA>   SCHULTE     <NA> 1940 12 29     TRUE 0.6043523
90                                                                         
91  331    INGRID     <NA>    KRAJSE     <NA> 1985  9  3                   
92  491    INGRID     <NA>    KRAUSE     <NA> 1985  9  3     TRUE 0.6043523
93                                                                         
94  415     MARIA     <NA>  DIETRICH     <NA> 1979  2  6                   
95  420     MARIA     <NA> DIETTRICH     <NA> 1979  2  6     TRUE 0.6043523
96                                                                         
97   48    WERNER     <NA>   KOERTIG     <NA> 1965 11 28                   
98  238   WERNIER     <NA>   KOERTIG     <NA> 1965 11 28     TRUE 0.5924569
99                                                                         
100  68    PETEVR     <NA>     FUCHS     <NA> 1972  9 12                   
101 190     PETER     <NA>     FUCHS     <NA> 1972  9 12     TRUE 0.5924569
102                                                                        
103  85 THORSKTEN     <NA>    MARTIN     <NA> 1995 11 15                   
104 187  THORSTEN     <NA>    MARTIN     <NA> 1995 11 15     TRUE 0.5924569
105                                                                        
106 158     PETER     <NA>    BECKER     <NA> 1960  9  5                   
107 229    PETERS     <NA>    BECKER     <NA> 1960  9  5     TRUE 0.5924569
108                                                                        
109 177 JOHANNNES     <NA>    SCHULZ     <NA> 1974  1 17                   
110 207  JOHANNES     <NA>    SCHULZ     <NA> 1974  1 17     TRUE 0.5924569
111                                                                        
112 209     ROLBF     <NA>   NEUMANN     <NA> 1967  3 29                   
113 227      ROLF     <NA>   NEUMANN     <NA> 1967  3 29     TRUE 0.5924569
114                                                                        
115 265 MARIANNFE     <NA>   MOELLER     <NA> 1961  9 17                   
116 456  MARIANNE     <NA>   MOELLER     <NA> 1961  9 17     TRUE 0.5924569
117                                                                        
118 266     KARIN     <NA>      HORN     <NA> 2002  6  4                   
119 437    KARINW     <NA>      HORN     <NA> 2002  6  4     TRUE 0.5924569
120                                                                        
121 298     SONJA     <NA>   FISCHER     <NA> 1989  7 17                   
122 464    SONJAD     <NA>   FISCHER     <NA> 1989  7 17     TRUE 0.5924569
123                                                                        
124 310    MONIKA     <NA> SCHNEIDER     <NA> 1937  6  2                   
125 432   MONIYKA     <NA> SCHNEIDER     <NA> 1937  6  2     TRUE 0.5924569
126                                                                        
127 377   SABAINE     <NA>      OTTO     <NA> 1940  7 23                   
128 448    SABINE     <NA>      OTTO     <NA> 1940  7 23     TRUE 0.5924569
129                                                                        
130 391  GABRIELE     <NA>    BECKER     <NA> 1990  3 27                   
131 496 GABRIHELE     <NA>    BECKER     <NA> 1990  3 27     TRUE 0.5924569
132                                                                        
133 395   GISOELA     <NA>      BECK     <NA> 2003  4 16                   
134 404    GISELA     <NA>      BECK     <NA> 2003  4 16     TRUE 0.5924569
135                                                                        
136 402   CHRISTA     <NA>   SCHWARZ     <NA> 1965  7 13                   
137 462  CHRISTAH     <NA>   SCHWARZ     <NA> 1965  7 13     TRUE 0.5924569
138                                                                        
139 388    ANDREA     <NA>     WEBER     <NA> 1945  5 20                   
140 408    ANDREA     <NA>   SCHMIDT     <NA> 1945  2 20    FALSE 0.5067013
141                                                                        
================================================================================
   id fname_c1 fname_c2 lname_c1 lname_c2   by bm bd is_match Class    Weight
1 388   ANDREA     <NA>    WEBER     <NA> 1945  5 20                         
2 408   ANDREA     <NA>  SCHMIDT     <NA> 1945  2 20    FALSE     L 0.5067013
3                                                                            
================================================================================
    id fname_c1 fname_c2 lname_c1 lname_c2   by bm bd is_match Class    Weight
1  353     INGE     <NA>   SEIDEL     <NA> 1949  9  4                         
2  355    INGEU     <NA>   SEIDEL     <NA> 1949  8  4     TRUE     N 0.4948059
3                                                                             
4  285    ERIKA     <NA>    WEBER     <NA> 1995  2  1                         
5  379    ERIKA     <NA>    WEBER     <NA> 1992  2 29     TRUE     N 0.4782410
6                                                                             
7  127     KARL     <NA>    KLEIN     <NA> 2002  6 20                         
8  142     KARL     <NA>   KLEIBN     <NA> 2002  6 29     TRUE     N 0.4692532
9                                                                             
10  37 HARTMHUT     <NA> HOFFMSNN     <NA> 1929 12 29                         
11  72  HARTMUT     <NA> HOFFMANN     <NA> 1929 12 29     TRUE     N 0.4081096
12                                                                            

RecordLinkage documentation built on Aug. 25, 2020, 5:07 p.m.