fpSim | R Documentation |
Search function for fingerprints, such as PubChem or atom pair fingerprints. Enables structure similarity comparisons, searching and clustering.
fpSim(x, y, sorted=TRUE, method="Tanimoto",
addone=1, cutoff=0, top="all", alpha=1, beta=1,
parameters=NULL,scoreType="similarity")
x |
Query molecule of class |
y |
Subject molecule(s) of class |
sorted |
return results sorted or unsorted |
method |
Similarity coefficient to return. One can choose here from several
predefined similarity measures: "Tanimoto" (default), "Euclidean", "Tversky" or
"Dice". Alternatively, one can pass on any custom similarity function containing the
arguments a, b, c and d. For instance, one can define "myfct <- function(a, b, c, d)
c/(alpha*a + beta*b + c)" and then pass on The predefined methods will run a C++ version of this function which is about twice as fast as the R version. When a custom similarity function is given however, it will fall back to using the R version. |
addone |
Value to add to numerator and denominator of similarity coefficient to avoid devision by zero when fingerprint(s) contain only "off-bits" (zeros). Note: if |
cutoff |
allows to restrict results to hits above a similarity cutoff value; default |
top |
allows to restrict number of subject molecules to return; default |
alpha |
Only used when method="Tversky". Allows to specify the weighting variable 'alpha' of the Tversky index: c/(alpha*a + beta*b + c) |
beta |
Only used when method="Tversky". Allows to specify the weighting variable 'beta' of the Tversky index. |
parameters |
Parameters for computing Z-scores, E-values, and p-values. Pass this data if you want these
scores returned. This data can be generated with the |
scoreType |
If using the |
Returns numeric vector
with similarity coefficients as values and compound identifiers as names.
Thomas Girke, Kevin Horan
Tanimoto similarity coefficient: Tanimoto TT (1957) IBM Internal Report 17th Nov see also Jaccard P (1901) Bulletin del la Societe Vaudoisedes Sciences Naturelles 37, 241-272.
PubChem fingerprint specification: ftp://ftp.ncbi.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
Functions: fp2bit
## Load PubChem SDFset sample
data(sdfsample); sdfset <- sdfsample
cid(sdfset) <- sdfid(sdfset)
## Convert base 64 encoded fingerprints to character vector or binary matrix
fpset <- fp2bit(sdfset)
## Alternatively, one can use atom pair fingerprints
## Not run:
fpset <- desc2fp(sdf2ap(sdfset))
## End(Not run)
## Pairwise compound structure comparisons
fpSim(x=fpset[1], y=fpset[2], method="Tanimoto")
## Structure similarity searching: x is query and y is fingerprint database
fpSim(x=fpset[1], y=fpset)
## Controlling the output
fpSim(x=fpset[1], y=fpset, method="Tversky", cutoff=0.4, top=4, alpha=0.5, beta=1)
## Use custom distance function
myfct <- function(a, b, c, d) c/(a+b+c+d)
fpSim(x=fpset[1], y=fpset, method=myfct)
## Compute fingerprint-based Tanimoto similarity matrix
simMA <- sapply(cid(fpset), function(x) fpSim(x=fpset[x], fpset, sorted=FALSE))
## Hierarchical clustering with simMA as input
hc <- hclust(as.dist(1-simMA), method="single")
## Plot hierarchical clustering tree
plot(as.dendrogram(hc), edgePar=list(col=4, lwd=2), horiz=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.