Search a descriptor database for compounds similar to query...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sim.R


Given descriptor of a query compound and a database of compound descriptors, search for compounds that are similar to the query compound. User can limit the output by supplying a cutoff similarity score or a cutoff that limits the number of returned compounds. The function can also return the scores together with the compounds.


2, query, type=1, cutoff = 0.5, return.score = FALSE, quiet = FALSE,
		    mode = 1,visualize = FALSE, visualize.browse = TRUE, visualize.query = NULL)



The compound descriptor database returned by 'cmp.parse'.


The query descriptor, which is usually returned by 'cmp.parse1'.


Returns results in form of position indices (type=1), named vector with compound IDs (type=2) or data frame (type=3).


The cutoff similarity (when cutoff <= 1) or the number of maximum compounds to be returned (when cutoff > 1).


Whether to return similarity scores. If set to TRUE, a data frame will be returned; otherwise, only the compounds' indices in the database will be returned in the order of decreasing scores.


Whether to disable progress information.


Mode used when computing similarity scores. This value is passed to 'cmp.similarity'.



'' will go through all the compound descriptors in the database and calculate the similarity between the query compound and compounds in the database. When cutoff similarity score is set, compounds having a similarity score higher than the cutoff will be returned. When maximum number of compounds to return is set to N via 'cutoff', the compounds having the highest N similarity scores will be returned.


When 'return.score' is set to FALSE, a vector of matching compounds' indices in the database will be returned. Otherwise, a data frame will be returned:


The indices of matching compounds in the database.


The similarity scores between the matching compounds and the query compound


Y. Eddie Cao, Li-Chang Cheng


Chen X and Reynolds CH (2002). "Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients", in J Chem Inf Comput Sci.

See Also

cmp.parse1, cmp.parse,, cmp.cluster, cmp.similarity, sdf.visualize


## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
db <- apset
query <- db[1]

## Ooptinally, save the db for future use
save(db, file="db.rda", compress=TRUE)

## Search for similar compounds using similarity cutoff, query, cutoff=0.2, type=1) # returns index, query, cutoff=0.2, type=2) # returns named vector, query, cutoff=0.2, type=3) # returns data frame

## in the next session, you may use load a saved db and do the search:
load("db.rda"), query, cutoff=3)
## you may also use the loaded db to do clustering:
cmp.cluster(db, cutoff=0.35)

Example output

| 4 %
/ 8 %
- 12 %
\ 16 %
| 20 %
/ 24 %
- 28 %
\ 32 %
| 36 %
/ 40 %
- 44 %
\ 48 %
| 52 %
/ 56 %
- 60 %
\ 64 %
| 68 %
/ 72 %
- 76 %
\ 80 %
| 84 %
/ 88 %
- 92 %
\ 96 %
| 100 %
 [1]  1 96 67 88 15 77 31 98 86 83 64 85 72  4  2 51 23 74 11 38 79 70 75 25 93
[26] 32 69 52 43 63 47 66 91 78 94  3 16 18 99 39 68 45 71 20 22  9 12 92 61 60
[51] 19 40

/ 4 %
- 8 %
\ 12 %
| 16 %
/ 20 %
- 24 %
\ 28 %
| 32 %
/ 36 %
- 40 %
\ 44 %
| 48 %
/ 52 %
- 56 %
\ 60 %
| 64 %
/ 68 %
- 72 %
\ 76 %
| 80 %
/ 84 %
- 88 %
\ 92 %
| 96 %
/ 100 %
   650001    650102    650072    650094    650015    650082    650032    650104 
1.0000000 0.3516643 0.3117569 0.3094629 0.3010753 0.2960969 0.2848181 0.2777778 
   650092    650089    650069    650091    650077    650004    650002    650054 
0.2739274 0.2738462 0.2736842 0.2724796 0.2674591 0.2641975 0.2637037 0.2633411 
   650024    650079    650011    650039    650085    650075    650080    650026 
0.2581121 0.2575107 0.2559653 0.2539062 0.2518337 0.2506297 0.2496552 0.2485795 
   650099    650033    650074    650056    650044    650068    650048    650071 
0.2438163 0.2410959 0.2408840 0.2330346 0.2322503 0.2321900 0.2320099 0.2301459 
   650097    650083    650100    650003    650016    650019    650105    650040 
0.2251908 0.2225313 0.2208333 0.2185714 0.2176471 0.2163389 0.2159091 0.2127329 
   650073    650046    650076    650021    650023    650009    650012    650098 
0.2124601 0.2112971 0.2107438 0.2099291 0.2098361 0.2098361 0.2082153 0.2071097 
   650066    650065    650020    650041 
0.2065064 0.2065064 0.2034884 0.2019544 

- 4 %
\ 8 %
| 12 %
/ 16 %
- 20 %
\ 24 %
| 28 %
/ 32 %
- 36 %
\ 40 %
| 44 %
/ 48 %
- 52 %
\ 56 %
| 60 %
/ 64 %
- 68 %
\ 72 %
| 76 %
/ 80 %
- 84 %
\ 88 %
| 92 %
/ 96 %
- 100 %
   index    cid    scores
1      1 650001 1.0000000
2     96 650102 0.3516643
3     67 650072 0.3117569
4     88 650094 0.3094629
5     15 650015 0.3010753
6     77 650082 0.2960969
7     31 650032 0.2848181
8     98 650104 0.2777778
9     86 650092 0.2739274
10    83 650089 0.2738462
11    64 650069 0.2736842
12    85 650091 0.2724796
13    72 650077 0.2674591
14     4 650004 0.2641975
15     2 650002 0.2637037
16    51 650054 0.2633411
17    23 650024 0.2581121
18    74 650079 0.2575107
19    11 650011 0.2559653
20    38 650039 0.2539062
21    79 650085 0.2518337
22    70 650075 0.2506297
23    75 650080 0.2496552
24    25 650026 0.2485795
25    93 650099 0.2438163
26    32 650033 0.2410959
27    69 650074 0.2408840
28    52 650056 0.2330346
29    43 650044 0.2322503
30    63 650068 0.2321900
31    47 650048 0.2320099
32    66 650071 0.2301459
33    91 650097 0.2251908
34    78 650083 0.2225313
35    94 650100 0.2208333
36     3 650003 0.2185714
37    16 650016 0.2176471
38    18 650019 0.2163389
39    99 650105 0.2159091
40    39 650040 0.2127329
41    68 650073 0.2124601
42    45 650046 0.2112971
43    71 650076 0.2107438
44    20 650021 0.2099291
45    22 650023 0.2098361
46     9 650009 0.2098361
47    12 650012 0.2082153
48    92 650098 0.2071097
49    61 650066 0.2065064
50    60 650065 0.2065064
51    19 650020 0.2034884
52    40 650041 0.2019544

\ 4 %
| 8 %
/ 12 %
- 16 %
\ 20 %
| 24 %
/ 28 %
- 32 %
\ 36 %
| 40 %
/ 44 %
- 48 %
\ 52 %
| 56 %
/ 60 %
- 64 %
\ 68 %
| 72 %
/ 76 %
- 80 %
\ 84 %
| 88 %
/ 92 %
- 96 %
\ 100 %
[1]  1 96 67

- 1 %
\ 15 %
- 16 %
\ 20 %
/ 20 %
\ 21 %
/ 21 %
\ 22 %
/ 22 %
\ 23 %
- 24 %
| 24 %
- 25 %
/ 26 %
\ 26 %
/ 27 %
\ 27 %
/ 28 %
\ 28 %
/ 29 %
\ 29 %
/ 30 %
\ 30 %
/ 31 %
\ 31 %
/ 32 %
\ 32 %
/ 33 %
\ 33 %
/ 34 %
\ 34 %
/ 35 %
\ 35 %
/ 36 %
| 37 %
- 37 %
/ 38 %
| 39 %
\ 40 %
- 41 %
| 42 %
- 42 %
| 43 %
- 43 %
| 44 %
- 44 %
| 45 %
- 45 %
| 46 %
- 46 %
| 47 %
- 47 %
| 48 %
- 48 %
| 49 %
- 49 %
| 50 %
sorting result...
       ids CLSZ_0.35 CLID_0.35
2   650002        28         2
8   650008        28         2
11  650011        28         2
15  650015        28         2
31  650032        28         2
38  650039        28         2
45  650046        28         2
47  650048        28         2
51  650054        28         2
52  650056        28         2
53  650058        28         2
63  650068        28         2
64  650069        28         2
65  650070        28         2
67  650072        28         2
69  650074        28         2
71  650076        28         2
75  650080        28         2
78  650083        28         2
79  650085        28         2
85  650091        28         2
86  650092        28         2
88  650094        28         2
91  650097        28         2
93  650099        28         2
94  650100        28         2
99  650105        28         2
100 650106        28         2
4   650004         8         4
12  650012         8         4
18  650019         8         4
32  650033         8         4
40  650041         8         4
77  650082         8         4
84  650090         8         4
98  650104         8         4
1   650001         2         1
96  650102         2         1
3   650003         2         3
7   650007         2         3
16  650016         2        16
72  650077         2        16
20  650021         2        20
28  650029         2        20
48  650049         2        48
49  650050         2        48
54  650059         2        54
55  650060         2        54
56  650061         2        56
57  650062         2        56
58  650063         2        58
59  650064         2        58
60  650065         2        60
61  650066         2        60
5   650005         1         5
6   650006         1         6
9   650009         1         9
10  650010         1        10
13  650013         1        13
14  650014         1        14
17  650017         1        17
19  650020         1        19
21  650022         1        21
22  650023         1        22
23  650024         1        23
24  650025         1        24
25  650026         1        25
26  650027         1        26
27  650028         1        27
29  650030         1        29
30  650031         1        30
33  650034         1        33
34  650035         1        34
35  650036         1        35
36  650037         1        36
37  650038         1        37
39  650040         1        39
41  650042         1        41
42  650043         1        42
43  650044         1        43
44  650045         1        44
46  650047         1        46
50  650052         1        50
62  650067         1        62
66  650071         1        66
68  650073         1        68
70  650075         1        70
73  650078         1        73
74  650079         1        74
76  650081         1        76
80  650086         1        80
81  650087         1        81
82  650088         1        82
83  650089         1        83
87  650093         1        87
89  650095         1        89
90  650096         1        90
92  650098         1        92
95  650101         1        95
97  650103         1        97

ChemmineR documentation built on Feb. 28, 2021, 2:02 a.m.