query_si: query_si

View source: R/dnam_search_index.R

query_siR Documentation

query_si

Description

Query an HNSW search index. Does K Nearest Neighbors lookup on a previously saved search index object, returning the K nearest neighbors of the queried sample(s). The 'query_si()' function returns verbose output, which can be silenced with suppressMessages()'.

Usage

query_si(
  sample_idv,
  fh_csv_fpath,
  si_fname = "new_search_index",
  si_fpath = ".",
  lkval = c(1, 2)
)

Arguments

sample_idv

Vector of valid sample IDs, or GSM IDs, which are included in the rownames of the hashed features table at fh_csv_fpath (requried, vector of char strings).

fh_csv_fpath

Path to the hashed features table, which includes rownames corresponding to sample ID strings in the sample_idv vector (required, char).

si_fname

Base filename of the search index object, used to find the search index and index dict files, which are expected to be located at si_fapth (required, char).

si_fpath

Path to the directory containing the search index and index dict files (required, char).

lkval

Vector of K nearest neighbors to return per query (optional, int, c(1,2)).

Examples

# file paths
# fh table
# fh_csv_fname <- system.file("extdata", "fhtest", 
# package = "recountmethylation")
# fh_csv_fname <- file.path(fh_csv_fname, "bval_fh10.csv")
# si dict
# index_dict_fname <- system.file("extdata", "sitest", 
# package = "recountmethylation")
# index_dict_fname <- file.path(index_dict_fname, "new_index_dict.pickle")

# set sample ids to query
# sample_idv <- c("GSM1038308.1548799666.hlink.GSM1038308_5958154021_R01C01",
#               "GSM1038309.1548799666.hlink.GSM1038309_5958154021_R02C01")
# set a list of k nearest neighbors to query
# lkval <- c(1,2,3)

# get query results as a data frame (with verbose results messaging)
# dfk <- query_si(sample_idv = sample_idv, lkval = lkval, 
#               fh_csv_fname = "bval_fn.csv", 
#               index_dict_fname = "new_index_dict.pickle")
# returns:
# Starting basilisk process...
# Defining the virtual env dependencies...
# Running virtual environment setup...
# Sourcing Python functions...
# Querying the search index...
# Getting hashed features data for samples...
# Getting index data for sample: 
# GSM1038308.1548799666.hlink.GSM1038308_5958154021_R01C01'
# Getting index data for sample: 
# GSM1038309.1548799666.hlink.GSM1038309_5958154021_R02C01'
# Beginning queries of k neighbors from lk...
# ii =  0 , ki =  1
# Loading search index...
# Querying 2 elements in data with k = 1 nearest neighbors...
# Query completed, time: 0.0007359981536865234
# Applying labels to query results...
# Returning data (sample id, k index, and distance)...
# ii =  1 , ki =  2
# Loading search index...
# Querying 2 elements in data with k = 2 nearest neighbors...
# Query completed, time: 0.0006208419799804688
# Applying labels to query results...
# Returning data (sample id, k index, and distance)...
# ii =  2 , ki =  3
# Provided k '3' > n si samples, skipping...
# Returning query results...

metamaden/recountmethylation documentation built on Jan. 5, 2023, 9:56 a.m.