Description Usage Arguments Details Value Author(s) See Also Examples
Classify record pairs with unsupervised clustering methods.
1 | classifyUnsup(rpairs, method, ...)
|
rpairs |
Object of type |
method |
The classification method to use. One of |
... |
Further arguments for the classification method |
A clustering algorithm is applied to find clusters in the comparison patterns. In the case of two clusters (the default), the cluster further from the origin (i.e. representing higher similarity values) is interpreted as the set of links, the other as the set of non-links.
Supported methods are:
K-means clustering, see kmeans
.
Bagged clustering, see bclust
.
An object of class "RecLinkResult"
that represents a copy
of newdata
with element rpairs$prediction
, which stores
the classification result, as addendum.
Andreas Borg, Murat Sariyar
trainSupv
and classifySupv
for supervised
classification.
1 2 3 4 5 6 | # Classification with bclust
data(RLdata500)
rpairs=compare.dedup(RLdata500, identity=identity.RLdata500,
blockfld=list(1,3,5,6,7))
result=classifyUnsup(rpairs,method="bclust")
summary(result)
|
Loading required package: DBI
Loading required package: RSQLite
Loading required package: ff
Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit
Attaching package: 'bit'
The following object is masked from 'package:base':
xor
Attaching package ff
- getOption("fftempdir")=="/work/tmp/tmp/RtmpmFNZKn"
- getOption("ffextension")=="ff"
- getOption("ffdrop")==TRUE
- getOption("fffinonexit")==TRUE
- getOption("ffpagesize")==65536
- getOption("ffcaching")=="mmnoflush" -- consider "ffeachflush" if your system stalls on large writes
- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
Attaching package: 'ff'
The following objects are masked from 'package:bit':
clone, clone.default, clone.list
The following objects are masked from 'package:utils':
write.csv, write.csv2
The following objects are masked from 'package:base':
is.factor, is.ordered
Loading required package: ffbase
Attaching package: 'ffbase'
The following objects are masked from 'package:ff':
[.ff, [.ffdf, [<-.ff, [<-.ffdf
The following objects are masked from 'package:base':
%in%, table
RecordLinkage library
[c] IMBEI Mainz
Attaching package: 'RecordLinkage'
The following object is masked from 'package:ff':
clone
The following object is masked from 'package:bit':
clone
Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering
Deduplication Data Set
500 records
18643 record pairs
50 matches
18593 non-matches
0 pairs with unknown status
82 links detected
0 possible links detected
18561 non-links detected
alpha error: 0.340000
beta error: 0.002635
accuracy: 0.996460
Classification table:
classification
true status N P L
FALSE 18544 0 49
TRUE 17 0 33
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.