Description Usage Arguments Details Value Author(s) See Also Examples
Tests the performance of embedding and LSH.
1 2 | eiPerformanceTest(runId,distance=getDefaultDist(descriptorType),
conn=defaultConn(dir),dir=".",K=200, W = 1.39564, M=19,L=10,T=30)
|
runId |
The id number identifying a particular set of settings for a database. This is generally
the number returned by |
distance |
The distance function to be used to compute the distance between two descriptors. A default function is provided for "ap" and "fp" descriptors. |
conn |
Database connection to use. |
dir |
The directory where the "data" directory lives. Defaults to the current directory. |
K |
Number of search results to use for LSH performance test. |
W |
Tunable LSH parameter. See LSHKIT page for details. http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html |
M |
Tunable LSH parameter. See LSHKIT page for details. http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html |
L |
Number of hash tables |
T |
Number of probes |
This will perform two different tests. The first tests the embedding results in similarity search. The way this works is by approximating 1,000 random similarity searches (determined by data/test_queries.iddb) by nearest neighbor search using the coordinates from the embedding results. The search results are then compared to the reference search results (chemical-search.results.gz).
The comparison results are summarized in two types of files. The first type lists the recall for different k values, k being the number of numbers to retrieve. These files are named as “recall-ratio-k”. For example, if the recall is 70 compound search - 70 of the 100 results are among the real top-100 compounds - then the value at line 100 is 0.7. Several relaxation ration are used, each generating a file in this form. For instance, recall.ratio-10 is the file listing the recalls when relaxation ratio is 10. The other file, recall.csv, lists recalls of different relaxation ratios in one file by limiting to selected k value. In this CSV file, the rows correspond to different relaxation ratios, and the columns are different k values. You will be able to pick an appropriate relaxation ratio for the k values you are interested in.
The second test measures the performance of the Locality Sensitive Hash (LSH).
The results for lsh-assisted search will be in
run-r-d/indexed.performance. It's a 1,000-line files of recall values. Each
line corresponds to one test query. LSH search performance is
highly sensitive to your LSH parameters (K, W, M, L, T). The
default parameters are listed in the man page for
eiPerformanceTest
. When you have your embedding result in
a matrix file, you should follow instruction on
http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html to
find the best values for these parameters.
No value is returned. Creates files in dir
/run-r-d.
Kevin Horan
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.