classifyUnsup: Unsupervised Classification

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/classify.r

Description

Classify record pairs with unsupervised clustering methods.

Usage

1
classifyUnsup(rpairs, method, ...)

Arguments

rpairs

Object of type RecLinkData. The data to classify.

method

The classification method to use. One of "kmeans", "bclust".

...

Further arguments for the classification method

Details

A clustering algorithm is applied to find clusters in the comparison patterns. In the case of two clusters (the default), the cluster further from the origin (i.e. representing higher similarity values) is interpreted as the set of links, the other as the set of non-links.

Supported methods are:

kmeans

K-means clustering, see kmeans.

bclust

Bagged clustering, see bclust.

Value

An object of class "RecLinkResult" that represents a copy of newdata with element rpairs$prediction, which stores the classification result, as addendum.

Author(s)

Andreas Borg, Murat Sariyar

See Also

trainSupv and classifySupv for supervised classification.

Examples

1
2
3
4
5
6
# Classification with bclust
data(RLdata500)
rpairs=compare.dedup(RLdata500, identity=identity.RLdata500,
                    blockfld=list(1,3,5,6,7))
result=classifyUnsup(rpairs,method="bclust")
summary(result)                    

Example output

Loading required package: DBI
Loading required package: RSQLite
Loading required package: ff
Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit

Attaching package: 'bit'

The following object is masked from 'package:base':

    xor

Attaching package ff
- getOption("fftempdir")=="/work/tmp/tmp/RtmpmFNZKn"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package: 'ff'

The following objects are masked from 'package:bit':

    clone, clone.default, clone.list

The following objects are masked from 'package:utils':

    write.csv, write.csv2

The following objects are masked from 'package:base':

    is.factor, is.ordered

Loading required package: ffbase

Attaching package: 'ffbase'

The following objects are masked from 'package:ff':

    [.ff, [.ffdf, [<-.ff, [<-.ffdf

The following objects are masked from 'package:base':

    %in%, table

RecordLinkage library
[c] IMBEI Mainz


Attaching package: 'RecordLinkage'

The following object is masked from 'package:ff':

    clone

The following object is masked from 'package:bit':

    clone

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

Deduplication Data Set

500 records 
18643 record pairs 

50 matches
18593 non-matches
0 pairs with unknown status


82 links detected 
0 possible links detected 
18561 non-links detected 

alpha error: 0.340000
beta error: 0.002635
accuracy: 0.996460


Classification table:

           classification
true status     N     P     L
      FALSE 18544     0    49
      TRUE     17     0    33

RecordLinkage documentation built on Aug. 25, 2020, 5:07 p.m.