eiQuery: Perform a query on an embedded database
In girke-lab/eiR-release: Accelerated similarity searching of small molecules

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/core.R

Finds similar compounds for each query.

	eiQuery(runId,queries,format="sdf",
		dir=".",distance=getDefaultDist(descriptorType),conn=defaultConn(dir),
		asSimilarity=FALSE, K=200, W = 1.39564, M=19,L=10,T=30,lshData=NULL,
		mainIds = readIddb(conn,file.path(dir, Main),sorted=TRUE))

`runId`	The id number identifying a particular set of settings for a database. This is generally the number returned by `eiMakeDb`. If your coming from an older version of eiR, you should not use this value instead of specifying `r`, `d`,`refIddb`, and `descriptorType`.
`queries`	This can be either an SDFset, or a file containg 1 or more query compounds.
`format`	The format in which the queries are given. Valid values are: "sdf" when `queries` is either a filename of an sdf file, or and SDFset object; "compound_id" when `queries` is a list of id numbers; and "name", when `queries` is a list of compound names, as returned by `cid(apset)`.
`dir`	The directory where the "data" directory lives. Defaults to the current directory.
`distance`	The distance function to be used to compute the distance between two descriptors. A default function is provided for "ap" and "fp" descriptors. The Tanimoto function is used by default.
`conn`	Database connection to use.
`asSimilarity`	If true, return similarity values instead of distance values. This only works in the given distance function returns values between 0 and 1. This is true for the default atom pair and finger print distance functions.
`K`	The number of results to return.
`W`	Tunable LSH parameter. See LSHKIT page for details. http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html
`M`	Tunable LSH parameter. See LSHKIT page for details. http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html
`L`	Number of hash tables
`T`	Number of probes
`lshData`	A pointer returned by `loadLSHData`. The LSH data is generally the largest chunk of data that must be held in memory while performing a query. Since it remains the same across queries it makes sense to pre-load the is data once when doing multiple queries. If this value is `NULL` the LSH data will be loaded internally and then released before `eiQuery` returns.
`mainIds`	A vector of all id numbers in the current database. This is mainly provided as an option here to avoid having to re-read the id list multiple times when executing several queries. If not supplied it will read it in itself.

This function identifies the database by the r, d, and refIddb parameters. The queries can be given in a few different formats, see the queries parameter for details. The LSH algorithm is used to quickly identify compounds similar to the queries. This function must use a distance function rather than a similarity function. However, if the distance function given returns values between 0 and 1, then the asSimilarity parameter may be used to return similarity values rather than distance values.

Returns a data frame with columns 'query', 'target', 'target_ids', and 'distance'. 'query' and 'target' are the compound names and distance is the distance between them, as computed by the given distance function.'target_ids' is the compound id of the target. Query namess are repeated for each matching target found. If asSimilarity is true then instead of a "distance" column there will be a "similarity" column.

Kevin Horan

eiInit eiMakeDb eiPerformanceTest

   library(snow)
   r<- 50
   d<- 40

   #initialize
   data(sdfsample)
   dir=file.path(tempdir(),"query")
   dir.create(dir)
   eiInit(sdfsample,dir=dir)

   #create compound db
   runId=eiMakeDb(r,d,numSamples=20,dir=dir,
      cl=makeCluster(1,type="SOCK",outfile=""))

   #find compounds similar two each query
   results = eiQuery(runId,sdfsample[1:2],K=15,dir=dir)