eiQuery | R Documentation |
Finds similar compounds for each query.
eiQuery(runId,queries,format="sdf",
dir=".",distance=getDefaultDist(descriptorType),conn=defaultConn(dir),
asSimilarity=FALSE, K=200, searchK=-1,lshData=NULL,
mainIds = readIddb(conn,file.path(dir, Main)))
runId |
The id number identifying a particular set of settings for a database. This is generally
the number returned by |
queries |
This can be either an SDFset, or a file containg 1 or more query compounds. |
format |
The format in which the queries are given. Valid values are: "sdf" when
|
dir |
The directory where the "data" directory lives. Defaults to the current directory. |
distance |
The distance function to be used to compute the distance between two descriptors. A default function is provided for "ap" and "fp" descriptors. The Tanimoto function is used by default. |
conn |
Database connection to use. |
asSimilarity |
If true, return similarity values instead of distance values. This only works in the given distance function returns values between 0 and 1. This is true for the default atom pair and finger print distance functions. |
K |
The number of results to return. |
searchK |
Tunable Annoy LSH parameter. A larger value will give more accurate results, but will take longer time to return. The default value of -1 will allow the value to chosen automatically, which will set a value of numTrees * (approximate number of nearest neighbors). See Annoy page for details. https://github.com/spotify/annoy |
lshData |
DEPRECATED. This is no longer used. |
mainIds |
A vector of all id numbers in the current database. This is mainly provided as an option here to avoid having to re-read the id list multiple times when executing several queries. If not supplied it will read it in itself. |
This function identifies the database by the r
, d
, and
refIddb
parameters. The queries can be given in a few
different formats, see the queries
parameter for details.
The LSH algorithm is used to quickly identify compounds similar to the
queries.
This function must use a distance function rather than a similarity function.
However, if the distance function given returns values between 0 and 1, then
the asSimilarity
parameter may be used to return similarity values rather
than distance values.
Returns a data frame with columns 'query', 'target', 'target_ids', and
'distance'. 'query' and 'target' are the compound names and
distance is the distance between them, as computed by
the given distance function.'target_ids' is the compound id of the target.
Query namess are repeated for each matching target found.
If asSimilarity
is true then instead of a "distance"
column there will be a "similarity" column.
Kevin Horan
eiInit
eiMakeDb
eiPerformanceTest
library(snow)
r<- 50
d<- 40
#initialize
data(sdfsample)
dir=file.path(tempdir(),"query")
dir.create(dir)
eiInit(sdfsample,dir=dir,skipPriorities=TRUE)
#create compound db
runId=eiMakeDb(r,d,numSamples=20,dir=dir,
cl=makeCluster(1,type="SOCK",outfile=""))
#find compounds similar two each query
results = eiQuery(runId,sdfsample[1:2],K=15,dir=dir)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.