NMSlib: Non metric space library

NMSlibR Documentation

Non metric space library

Description

Non metric space library

Non metric space library

Usage

# init <- NMSlib$new(input_data, Index_Params = NULL, Time_Params = NULL,
#                           space='l1', space_params = NULL, method = 'hnsw',
#                           data_type = 'DENSE_VECTOR', dtype = 'FLOAT',
#                           index_filepath = NULL, load_data = FALSE,
#                           print_progress = FALSE)

Details

input_data parameter : In case of numeric data the input_data parameter should be either an R matrix object or a scipy sparse matrix. Additionally, the input_data parameter can be a list including more than one matrices / sparse-matrices having the same number of columns ( this is ideal for instance if the user wants to include both a train and a test dataset in the created index )

the Knn_Query function finds the approximate K nearest neighbours of a vector in the index

the knn_Query_Batch Performs multiple queries on the index, distributing the work over a thread pool

the save_Index function saves the index to disk

If the index_filepath parameter is not NULL then an existing index will be loaded

Incrementally updating an already saved (and loaded) index is not possible (see: https://github.com/nmslib/nmslib/issues/73)

Methods

NMSlib$new(input_data, Index_Params = NULL, Time_Params = NULL, space='l1', space_params = NULL, method = 'hnsw', data_type = 'DENSE_VECTOR', dtype = 'FLOAT', index_filepath = NULL, load_data = FALSE, print_progress = FALSE)
--------------
Knn_Query(query_data_row, k = 5)
--------------
knn_Query_Batch(query_data, k = 5, num_threads = 1)
--------------
save_Index(filename, save_data = FALSE)

Methods

Public methods


Method new()

Usage
NMSlib$new(
  input_data,
  Index_Params = NULL,
  Time_Params = NULL,
  space = "l1",
  space_params = NULL,
  method = "hnsw",
  data_type = "DENSE_VECTOR",
  dtype = "FLOAT",
  index_filepath = NULL,
  load_data = FALSE,
  print_progress = FALSE
)
Arguments
input_data

the input data. See details for more information

Index_Params

a list of (optional) parameters to use in indexing (when creating the index)

Time_Params

a list of parameters to use in querying. Setting Time_Params to NULL will reset

space

a character string (optional). The metric space to create for this index. Page 31 of the manual (see references) explains all available inputs

space_params

a list of (optional) parameters for configuring the space. See the references manual for more details.

method

a character string specifying the index method to use

data_type

a character string. One of 'DENSE_UINT8_VECTOR', 'DENSE_VECTOR', 'OBJECT_AS_STRING' or 'SPARSE_VECTOR'

dtype

a character string. Either 'FLOAT' or 'INT'

index_filepath

a character string specifying the path to a file, where an existing index is saved

load_data

a boolean. If TRUE then besides the index also the saved data will be loaded. This parameter is used when the index_filepath parameter is not NULL (see the web links in the references section for more details). The user might also have to specify the skip_optimized_index parameter of the Index_Params in the "init" method

print_progress

a boolean (either TRUE or FALSE). Whether or not to display progress bar


Method Knn_Query()

Usage
NMSlib$Knn_Query(query_data_row, k = 5, include_query_data_row_index = FALSE)
Arguments
query_data_row

a vector to query for

k

an integer. The number of neighbours to return

include_query_data_row_index

a boolean. If TRUE then the index of the query data row will be returned as well. It currently defaults to FALSE which means the first matched index is excluded from the results (this parameter will be removed in version 1.1.0 and the output behavior of the function will be changed too - see the deprecation warning)


Method knn_Query_Batch()

Usage
NMSlib$knn_Query_Batch(query_data, k = 5, num_threads = 1)
Arguments
query_data

the query_data parameter should be of the same type with the input_data parameter. Queries to query for

k

an integer. The number of neighbours to return

num_threads

an integer. The number of threads to use


Method save_Index()

Usage
NMSlib$save_Index(filename, save_data = FALSE)
Arguments
filename

a character string specifying the path. The filename to save ( in case of the save_Index method ) or the filename to load ( in case of the load_Index method )

save_data

a boolean. If TRUE then besides the index also the data will be saved (see the web links in the references section for more details)


Method clone()

The objects of this class are cloneable with this method.

Usage
NMSlib$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

https://github.com/nmslib/nmslib/blob/master/manual/latex/manual.pdf

https://github.com/nmslib/nmslib/blob/master/python_bindings/notebooks/search_vector_dense_optim.ipynb

https://github.com/nmslib/nmslib/blob/master/python_bindings/notebooks/search_vector_dense_nonoptim.ipynb

https://github.com/nmslib/nmslib/issues/356

https://github.com/nmslib/nmslib/blob/master/manual/methods.md

https://github.com/nmslib/nmslib/blob/master/manual/spaces.md

Examples


try({
  if (reticulate::py_available(initialize = FALSE)) {
    if (reticulate::py_module_available("nmslib")) {

      library(nmslibR)

      set.seed(1)
      x = matrix(runif(1000), nrow = 100, ncol = 10)

      init_nms = NMSlib$new(input_data = x)


      # returns a 1-dimensional vector (index, distance)
      #--------------------------------------------------

      init_nms$Knn_Query(query_data_row = x[1, ], k = 5)


      # returns knn's for all data
      #---------------------------

      all_dat = init_nms$knn_Query_Batch(x, k = 5, num_threads = 1)
    }
  }
}, silent=TRUE)

nmslibR documentation built on Feb. 16, 2023, 5:17 p.m.