sd2gram: sd2gram - Similarity of molecules by the marginalized kernel...
In Rchemcpp: Similarity measures for chemical compounds

Description Usage Arguments Value Author(s) References Examples

This tools compute the marginalized kernel (Kashima, 2004) and its proposed extensions (Mahe, 2005).

  sd2gram(sdf, sdf2, stopP = 0.1, filterTottering = FALSE,
    converg = as.integer(1000), atomKernelMatrix = "",
    flagRemoveH = FALSE, morganOrder = as.integer(0),
    silentMode = FALSE, returnNormalized = TRUE,
    detectArom = FALSE)

`sdf`	File containing the molecules. Must be in MDL file format (MOL and SDF files). For more information on the file format see http://en.wikipedia.org/wiki/Chemical_table_file.
`sdf2`	A second file containing molecules. Must also be in SDF format. If specified the molecules of the first file will be compared with the molecules of this second file. Default = "missing".
`stopP`	The probability that a random walk stops. The higher the value the more weigth is put on shorter walks. Default = 0.1.
`filterTottering`	A logical specifying whether tottering paths should be removed. Default = FALSE.
`converg`	A numeric value specifying when convergence is reached. The algorithm stops when the kernel value does not change by more than 1/c, where c is the value specified by the converg option. Default = 1000.
`atomKernelMatrix`	A string that sets the similarity measure between atoms that should be used. Default = "missing".
`flagRemoveH`	A logical that indicates whether H-atoms should be removed or not. Default = FALSE.
`morganOrder`	The order of the DeMorgan indices to be used. If set to zero, no DeMorgan indices are used. The higher the order the more types of atoms exist and consequently the more dissimilar will be the molecules. Default = 0.
`silentMode`	Whether or not the program should print progress reports to the standart output. Default = FALSE.
`returnNormalized`	A logical specifying whether a normalized kernel matrix should be returned. Default = TRUE.
`detectArom`	Whether aromatic rings should be detected and aromatic bonds should a special bond type. If large molecules are in the data set the detection of aromatic rings can be very time-consuming. (Default = FALSE).

A numeric matrix containing the similarity values between the molecules.

Michael Mahr <rchemcpp@bioinf.jku.at> c++ function written by Jean-Luc Perret and Pierre Mahe.

(Kashima, 2004) – H. Kashima, K. Tsuda, and A. Inokuchi. Kernels for graphs. In B. Schoelkopf, K. Tsuda, and J.P. Vert, editors, Kernel Methods in Computational Biology, pages 155-170. MIT Press, 2004.

(Mahe, 2005) – P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Graph kernels for molecular structure- activity relationship analysis with support vector machines. J Chem Inf Model, 45(4):939-51, 2005.

1
2
3

sdfolder <- system.file("extdata",package="Rchemcpp")
sdf <- list.files(sdfolder,full.names=TRUE,pattern="small")
K <- sd2gram(sdf)

[1] "reading file"
[1] "reading file done"
[1] "setting morgan labels"
[1] "using moleculeKernel Kashima"
calculating the Kashima gram matrix for 30 x 30 molecules
Using a single thread
calculating 30 x 30
0 / 30
1 / 30
2 / 30
3 / 30
4 / 30
5 / 30
6 / 30
7 / 30
8 / 30
9 / 30
10 / 30
11 / 30
12 / 30
13 / 30
14 / 30
15 / 30
16 / 30
17 / 30
18 / 30
19 / 30
20 / 30
21 / 30
22 / 30
23 / 30
24 / 30
25 / 30
26 / 30
27 / 30
28 / 30
29 / 30

Report:
0 molecules pairs could not be distinguished using the graph kernel
0 of them had a different biological activity
0 of them had unknown biological activity

0 molecules pairs were orthogonal
0 of them had a different biological activity
0 of them had unknown biological activity
[1] "end"