Clustering

Share:

Description

Clustering for exact matching and BABLBS matching

Usage

1
clusters(input, bablbs, type = "exact", work_dir = "")

Arguments

input

is the result of the call_acm function in the format of ULI ULI numbands numbands num_matches

bablbs

is the result of the call_bablbs, call_gd1 or call_gd2 function in the format of ULI ULI

work_dir

is where the datasets should be stored

type

value ("exact", "bablbs", "gd1", "gd2") indicating if you want exact matching or BABLBS/Genetic Distance. Default is "exact" i.e. exact matching.

Value

A list with 7 components. The first (SINGLE) is a 3 column matrix of all fingerprint IDs that do not belong to a cluster. The columns correspond to the Cluster_Number, the Cluster_size, and the Fingerprint ID. The second (CLUSTERED) is a matrix of all fingerprint IDs that do belong to a cluster. The columns correspond to the Cluster_Number, the Cluster_size, and the Fingerprint IDs that belong to that cluster. The third (BOTH) combines the others into one matrix. The fourth and fifth calculate RTIN and RTIn-1. The last two are used for the histograms that are produced by a call to this function.

Author(s)

Andrea Benedetti andrea.benedetti@mcgill.ca

Sahir Rai Bhatnagar

XiaoFei Zhao

References

Salamon et. al (1998) Accommodating Error Analysis in Comparison and Clustering of Molecular Fingerprints. Emerging Infectious Diseases Vol. 4, No. 2, April-June 1998

Abasci LLC. JAMES v1.0 User Documentation. 2002.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#synthesize the results
# Exact matching clusters
exact<-clusters(input=res1,type="exact")
names(exact)
exact$RTIN
exact$RTIN1
# Clustering based on BABLBS
bablbs<-clusters(input=res1, bablbs=res_bab,type="bablbs")
names(bablbs)
bablbs$RTIN
bablbs$RTIN1
# Clustering based on GD1
gd1<-clusters(input=res1, bablbs=res_gd1,type="gd1")
names(gd1)
gd1$RTIN
gd1$RTIN1