Function to perform Jarvis-Patrick clustering. The algorithm requires a
nearest neighbor table, which consists of neighbors for each item
in the dataset. This information is then used to join items into clusters
with the following requirements:
(a) they are contained in each other's neighbor list
(b) they share at least 'k' nearest neighbors
The nearest neighbor table can be computed with
For standard Jarvis-Patrick clustering, this function takes the number of neighbors to keep for each item.
It also has the option of passing a cutoff similarity value instead of the number of neighbors. In this mode, all
neighbors which meet the cutoff criteria will be included in the table.
This is a setting that is not part of the original Jarvis-Patrick algorithm. It
allows to generate tighter clusters and to minimize some limitations of this method, such as joining completely
unrelated items when clustering small data sets. Other extensions, such as the
linkage parameter, can also
help improve the clustering quality.
A nearest neighbor table, as produced by
Minimum number of nearest neighbors two rows (items) in the nearest neighbor table need to have in common to join them into the same cluster.
Can be one of "single", "average", or "complete", for single linkage, average linkage and complete linkage merge requirements, respectively. In the context of Jarvis-Patrick, average linkage means that at least half of the pairs between the clusters under consideration must pass the merge requirement. Similarly, for complete linkage, all pairs must pass the merge requirement. Single linkage is the normal case for Jarvis-Patrick and just means that at least one pair must meet the requirement.
Depending on the setting under the
type argument, the function returns the clustering result in a
named vector or a nearest neighbor table as
Jarvis RA, Patrick EA (1973) Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers, C22, 1025-1034. URLs: http://davide.eynard.it/teaching/2012_PAMI/JP.pdf, http://www.btluke.com/jpclust.html, http://www.daylight.com/dayhtml/doc/cluster/index.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Load/create sample APset and FPset data(apset) fpset <- desc2fp(apset) ## Standard Jarvis-Patrick clustering on APset/FPset objects jarvisPatrick(nearestNeighbors(apset,numNbrs=6), k=5, mode="a1a2b") jarvisPatrick(nearestNeighbors(fpset,numNbrs=6), k=5, mode="a1a2b") ## Jarvis-Patrick clustering only with requirement (b) jarvisPatrick(nearestNeighbors(fpset,numNbrs=6), k=5, mode="b") ## Modified Jarvis-Patrick clustering with minimum similarity 'cutoff' ## value (here Tanimoto coefficient) jarvisPatrick(nearestNeighbors(fpset,cutoff=0.6, method="Tanimoto"), k=2 ) ## Output nearest neighbor table (matrix) nnm <- nearestNeighbors(fpset,numNbrs=6) ## Perform clustering on precomputed nearest neighbor table jarvisPatrick(nnm, k=5)