jarvisPatrick: Jarvis-Patrick Clustering

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/cluster.R

Description

Function to perform Jarvis-Patrick clustering. The algorithm requires a nearest neighbor table, which consists of neighbors for each item in the dataset. This information is then used to join items into clusters with the following requirements: (a) they are contained in each other's neighbor list (b) they share at least 'k' nearest neighbors The nearest neighbor table can be computed with nearestNeighbors. For standard Jarvis-Patrick clustering, this function takes the number of neighbors to keep for each item. It also has the option of passing a cutoff similarity value instead of the number of neighbors. In this mode, all neighbors which meet the cutoff criteria will be included in the table. This is a setting that is not part of the original Jarvis-Patrick algorithm. It allows to generate tighter clusters and to minimize some limitations of this method, such as joining completely unrelated items when clustering small data sets. Other extensions, such as the linkage parameter, can also help improve the clustering quality.

Usage

1
jarvisPatrick(nnm,  k, mode="a1a2b", linkage="single") 

Arguments

nnm

A nearest neighbor table, as produced by nearestNeighbors.

k

Minimum number of nearest neighbors two rows (items) in the nearest neighbor table need to have in common to join them into the same cluster.

mode

If mode = "a1a2b" (default), the clustering is run with both requirements (a) and (b); if mode = "a1b" then (a) is relaxed to a unidirectional requirement; and if mode = "b" then only requirement (b) is used. The size of the clusters generated by the different methods increases in this order: "a1a2b" < "a1b" < "b". The run time of method "a1a2b" follows a close to linear relationship, while it is nearly quadratic for the much more exhaustive method "b". Only methods "a1a2b" and "a1b" are suitable for clustering very large data sets (e.g. >50,000 items) in a reasonable amount of time.

linkage

Can be one of "single", "average", or "complete", for single linkage, average linkage and complete linkage merge requirements, respectively. In the context of Jarvis-Patrick, average linkage means that at least half of the pairs between the clusters under consideration must pass the merge requirement. Similarly, for complete linkage, all pairs must pass the merge requirement. Single linkage is the normal case for Jarvis-Patrick and just means that at least one pair must meet the requirement.

Details

...

Value

Depending on the setting under the type argument, the function returns the clustering result in a named vector or a nearest neighbor table as matrix.

Note

...

Author(s)

Thomas Girke

References

Jarvis RA, Patrick EA (1973) Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers, C22, 1025-1034. URLs: http://davide.eynard.it/teaching/2012_PAMI/JP.pdf, http://www.btluke.com/jpclust.html, http://www.daylight.com/dayhtml/doc/cluster/index.pdf

See Also

Functions: cmp.cluster trimNeighbors nearestNeighbors

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Load/create sample APset and FPset 
data(apset)
fpset <- desc2fp(apset)

## Standard Jarvis-Patrick clustering on APset/FPset objects
jarvisPatrick(nearestNeighbors(apset,numNbrs=6), k=5, mode="a1a2b")
jarvisPatrick(nearestNeighbors(fpset,numNbrs=6), k=5, mode="a1a2b")

## Jarvis-Patrick clustering only with requirement (b) 
jarvisPatrick(nearestNeighbors(fpset,numNbrs=6), k=5, mode="b")

## Modified Jarvis-Patrick clustering with minimum similarity 'cutoff' 
## value (here Tanimoto coefficient)
jarvisPatrick(nearestNeighbors(fpset,cutoff=0.6, method="Tanimoto"), k=2 )

## Output nearest neighbor table (matrix)
nnm <- nearestNeighbors(fpset,numNbrs=6)

## Perform clustering on precomputed nearest neighbor table
jarvisPatrick(nnm, k=5)

Example output

650001 650002 650003 650004 650005 650006 650007 650008 650009 650010 650011 
     1      2      3      4      5      6      7      8      9     10     11 
650012 650013 650014 650015 650016 650017 650019 650020 650021 650022 650023 
    12     13     14     11     15     16     17     18     19     20     21 
650024 650025 650026 650027 650028 650029 650030 650031 650032 650033 650034 
    22     23     24     25     26     27     28     29     30     31     32 
650035 650036 650037 650038 650039 650040 650041 650042 650043 650044 650045 
    33     34     35     36     37     38     39     40     41     42     43 
650046 650047 650048 650049 650050 650052 650054 650056 650058 650059 650060 
    44     45     46     47     48     49     50     51     52     53     54 
650061 650062 650063 650064 650065 650066 650067 650068 650069 650070 650071 
    55     56     57     58     59     60     61     62     63     64     65 
650072 650073 650074 650075 650076 650077 650078 650079 650080 650081 650082 
    66     67     68     69     70     71     72     73     74     75     76 
650083 650085 650086 650087 650088 650089 650090 650091 650092 650093 650094 
    77     78     79     80     81     82     83     84     85     86     87 
650095 650096 650097 650098 650099 650100 650101 650102 650103 650104 650105 
    88     89     90     91     92     93     94     95     96     97     98 
650106 
    99 
650001 650002 650003 650004 650005 650006 650007 650008 650009 650010 650011 
     1      2      3      4      5      6      7      8      9     10     11 
650012 650013 650014 650015 650016 650017 650019 650020 650021 650022 650023 
    12     13     14     11     15     16     17     18     19     20     21 
650024 650025 650026 650027 650028 650029 650030 650031 650032 650033 650034 
    22     23     24     25     26     27     28     29     30     31     32 
650035 650036 650037 650038 650039 650040 650041 650042 650043 650044 650045 
    33     34     35     36     37     38     39     40     41     42     43 
650046 650047 650048 650049 650050 650052 650054 650056 650058 650059 650060 
    44     45     46     47     48     49     50     51     52     53     54 
650061 650062 650063 650064 650065 650066 650067 650068 650069 650070 650071 
    55     56     57     58     59     60     61     62     63     64     65 
650072 650073 650074 650075 650076 650077 650078 650079 650080 650081 650082 
    66     67     68     69     70     71     72     73     74     75     76 
650083 650085 650086 650087 650088 650089 650090 650091 650092 650093 650094 
    77     78     79     80     81     82     83     84     85     86     87 
650095 650096 650097 650098 650099 650100 650101 650102 650103 650104 650105 
    88     89     90     91     92     93     94      1     95     96     97 
650106 
    98 
650001 650002 650003 650004 650005 650006 650007 650008 650009 650010 650011 
     1      2      3      4      5      6      7      8      9     10     11 
650012 650013 650014 650015 650016 650017 650019 650020 650021 650022 650023 
    12     13     14     11     15     16     17     18     19     20     21 
650024 650025 650026 650027 650028 650029 650030 650031 650032 650033 650034 
    22     23     24     25     26     27     28     29     24     30     31 
650035 650036 650037 650038 650039 650040 650041 650042 650043 650044 650045 
    32     33     34     35     36     37      9     38     39     40     41 
650046 650047 650048 650049 650050 650052 650054 650056 650058 650059 650060 
    11     42     11     43     43     44     45     46     47     48     48 
650061 650062 650063 650064 650065 650066 650067 650068 650069 650070 650071 
    49     49     50     50     51     51     52     53     11     47     54 
650072 650073 650074 650075 650076 650077 650078 650079 650080 650081 650082 
    55     56     57     58     59     60     61     62     36     63     64 
650083 650085 650086 650087 650088 650089 650090 650091 650092 650093 650094 
    65     57     66     67     68     69     70      5     11     71     72 
650095 650096 650097 650098 650099 650100 650101 650102 650103 650104 650105 
    73     74     75     76     77     78     79      1     80     81     82 
650106 
    13 
650001 650002 650003 650004 650005 650006 650007 650008 650009 650010 650011 
     1      2      3      4      5      6      7      8      9     10     11 
650012 650013 650014 650015 650016 650017 650019 650020 650021 650022 650023 
    12     13     14     15     16     17     18     19     20     21     22 
650024 650025 650026 650027 650028 650029 650030 650031 650032 650033 650034 
    23     24     25     26     27     28     29     30     31     32     33 
650035 650036 650037 650038 650039 650040 650041 650042 650043 650044 650045 
    34     35     36     37     38     39     40     41     42     43     44 
650046 650047 650048 650049 650050 650052 650054 650056 650058 650059 650060 
    45     46     47     48     49     50     51     52     53     54     55 
650061 650062 650063 650064 650065 650066 650067 650068 650069 650070 650071 
    56     57     58     59     60     61     62     63     64     65     66 
650072 650073 650074 650075 650076 650077 650078 650079 650080 650081 650082 
    67     68     69     70     71     72     73     74     75     76     77 
650083 650085 650086 650087 650088 650089 650090 650091 650092 650093 650094 
    78     79     80     81     82     83     84     85     86     87     88 
650095 650096 650097 650098 650099 650100 650101 650102 650103 650104 650105 
    89     90     91     92     93     94     95     96     97     98     99 
650106 
   100 
650001 650002 650003 650004 650005 650006 650007 650008 650009 650010 650011 
     1      2      3      4      5      6      7      8      9     10     11 
650012 650013 650014 650015 650016 650017 650019 650020 650021 650022 650023 
    12     13     14     11     15     16     17     18     19     20     21 
650024 650025 650026 650027 650028 650029 650030 650031 650032 650033 650034 
    22     23     24     25     26     27     28     29     30     31     32 
650035 650036 650037 650038 650039 650040 650041 650042 650043 650044 650045 
    33     34     35     36     37     38     39     40     41     42     43 
650046 650047 650048 650049 650050 650052 650054 650056 650058 650059 650060 
    44     45     46     47     48     49     50     51     52     53     54 
650061 650062 650063 650064 650065 650066 650067 650068 650069 650070 650071 
    55     56     57     58     59     60     61     62     63     64     65 
650072 650073 650074 650075 650076 650077 650078 650079 650080 650081 650082 
    66     67     68     69     70     71     72     73     74     75     76 
650083 650085 650086 650087 650088 650089 650090 650091 650092 650093 650094 
    77     78     79     80     81     82     83     84     85     86     87 
650095 650096 650097 650098 650099 650100 650101 650102 650103 650104 650105 
    88     89     90     91     92     93     94      1     95     96     97 
650106 
    98 

ChemmineR documentation built on Feb. 28, 2021, 2:02 a.m.