cmp.similarity: Compute similarity between two compounds using their...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sim.R

Description

Given descriptors for two compounds, 'cmp.similarity' returns the similarity measure between the two compounds.

Usage

1
cmp.similarity(a, b, mode = 1, worst = 0)

Arguments

a

Descriptor of the first compound.

b

Descriptor of the second compound.

mode

Mode used when computing the distance. See details below.

worst

The worst value you are expecting. If 'cmp.similarity' finds the upper bound of similarity is worse than it, it will return a 0 and potentially save some computation.

Details

'cmp.similarity' uses descriptor information generated by 'cmp.parse' and 'cmp.parse1'. Basically, a descriptor is a vector of numbers. The vector actually reprsents the set of descriptors of structural fragment. Similarity measurement uses Tanimoto coefficient.

'cmp.similarity' supports 3 different modes. In mode 1, normal Tanimoto coefficient is used. In mode 2, it uses the size of descriptor intersection over the size of the smaller descriptor, mainly to deal with compounds that vary a lot in size. In mode 3, it is similar to mode 2, except that it raises the similarity to the power 3 to penalize small values. When mode is 0, 'cmp.similarity' will select mode 1 or mode 3, based on the size differences between the two descriptors.

When 'cmp.similarity' is used in searching compounds with a threshold similarity value, or in clustering with a cutoff distance, the threshold similarity and cutoff distance can be used to decide a 'worse' value. 'cmp.similarity' can compute an upper bound of similarity easier, and by comparing this upper bound to the 'worst' value, it can potentially skip the real computation if it finds the similarity will be below the 'worst' value and will be useless to the caller.

Value

Return a numeric value between 0 and 1 which gives the similarity between the two compounds.

Author(s)

Y. Eddie Cao, Li-Chang Cheng

References

Chen X and Reynolds CH (2002). "Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients", in J Chem Inf Comput Sci.

Peter Willett (1998). "Chemical Similarity Searching", in J. Chem. Inf. Comput. Sci.

See Also

cmp.parse1, cmp.parse, cmp.search, cmp.cluster

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 

## Compute similarities among two compounds
cmp.similarity(apset[1], apset[2])

## Search apset database with a query compound
cmp.search(apset, apset[1], type=3, cutoff = 0.3)

Example output

[1] 0.2637037

                                           
| 4 %
                                           
/ 8 %
                                           
- 12 %
                                           
\ 16 %
                                           
| 20 %
                                           
/ 24 %
                                           
- 28 %
                                           
\ 32 %
                                           
| 36 %
                                           
/ 40 %
                                           
- 44 %
                                           
\ 48 %
                                           
| 52 %
                                           
/ 56 %
                                           
- 60 %
                                           
\ 64 %
                                           
| 68 %
                                           
/ 72 %
                                           
- 76 %
                                           
\ 80 %
                                           
| 84 %
                                           
/ 88 %
                                           
- 92 %
                                           
\ 96 %
                                           
| 100 %
  index    cid    scores
1     1 650001 1.0000000
2    96 650102 0.3516643
3    67 650072 0.3117569
4    88 650094 0.3094629
5    15 650015 0.3010753

ChemmineR documentation built on Feb. 28, 2021, 2:02 a.m.