Non metric space library

Non metric space library

# init <- NMSlib$new(input_data, Index_Params = NULL, Time_Params = NULL,
#                           space='l1', space_params = NULL, method = 'hnsw',
#                           data_type = 'DENSE_VECTOR', dtype = 'FLOAT',
#                           index_filepath = NULL, print_progress = FALSE)

Details

input_data parameter : In case of numeric data the input_data parameter should be either an R matrix object or a scipy sparse matrix. Additionally, the input_data parameter can be a list including more than one matrices / sparse-matrices having the same number of columns ( this is ideal for instance if the user wants to include both a train and a test dataset in the created index )

the Knn_Query function finds the approximate K nearest neighbours of a vector in the index

the knn_Query_Batch Performs multiple queries on the index, distributing the work over a thread pool

the save_Index function saves the index to disk

If the index_filepath parameter is not NULL then an existing index will be loaded

Methods

NMSlib$new(input_data, Index_Params = NULL, Time_Params = NULL, space='l1', space_params = NULL, method = 'hnsw', data_type = 'DENSE_VECTOR', dtype = 'FLOAT', index_filepath = NULL, print_progress = FALSE)

--------------

Knn_Query(query_data_row, k = 5)

--------------

knn_Query_Batch(query_data, k = 5, num_threads = 1)

--------------

save_Index(filename)

References

https://github.com/nmslib/nmslib/blob/master/manual/latex/manual.pdf

Methods

Public methods


Method new()

Usage

NMSlib$new(
  input_data,
  Index_Params = NULL,
  Time_Params = NULL,
  space = "l1",
  space_params = NULL,
  method = "hnsw",
  data_type = "DENSE_VECTOR",
  dtype = "FLOAT",
  index_filepath = NULL,
  print_progress = FALSE
)

Arguments

input_data

the input data. See details for more information

Index_Params

a list of (optional) parameters to use in indexing (when creating the index)

Time_Params

a list of parameters to use in querying. Setting Time_Params to NULL will reset

space

a character string (optional). The metric space to create for this index. Page 31 of the manual (see references) explains all available inputs

space_params

a list of (optional) parameters for configuring the space. See the references manual for more details.

method

a character string specifying the index method to use

data_type

a character string. One of 'DENSE_UINT8_VECTOR', 'DENSE_VECTOR', 'OBJECT_AS_STRING' or 'SPARSE_VECTOR'

dtype

a character string. Either 'FLOAT' or 'INT'

index_filepath

a character string specifying the path to a file, where an existing index is saved

print_progress

a boolean (either TRUE or FALSE). Whether or not to display progress bar


Method Knn_Query()

Usage

NMSlib$Knn_Query(query_data_row, k = 5)

Arguments

query_data_row

a vector to query for

k

an integer. The number of neighbours to return


Method knn_Query_Batch()

Usage

NMSlib$knn_Query_Batch(query_data, k = 5, num_threads = 1)

Arguments

query_data

the query_data parameter should be of the same type with the input_data parameter. Queries to query for

k

an integer. The number of neighbours to return

num_threads

an integer. The number of threads to use


Method save_Index()

Usage

NMSlib$save_Index(filename)

Arguments

filename

a character string specifying the path. The filename to save ( in case of the save_Index method ) or the filename to load ( in case of the load_Index method )


Method clone()

The objects of this class are cloneable with this method.

Usage

NMSlib$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples


try({
  if (reticulate::py_available(initialize = FALSE)) {
    if (reticulate::py_module_available("nmslib")) {

      library(nmslibR)

      set.seed(1)
      x = matrix(runif(1000), nrow = 100, ncol = 10)

      init_nms = NMSlib$new(input_data = x)


      # returns a 1-dimensional vector (index, distance)
      #--------------------------------------------------

      init_nms$Knn_Query(query_data_row = x[1, ], k = 5)


      # returns knn's for all data
      #---------------------------

      all_dat = init_nms$knn_Query_Batch(x, k = 5, num_threads = 1)
    }
  }
}, silent=TRUE)