RadixForest | R Documentation |
Radix Forest class implementation
The RadixForest class is a specialization of the RadixTree implementation. Instead of putting sequences into a single tree, the RadixForest class puts sequences into separate trees based on sequence length. This allows for faster searching of similar sequences based on Hamming or Levenshtein distance metrics. Unlike the RadixTree class, the RadixForest class does not support anchored searches or a custom cost matrix. See RadixTree for additional details.
forest_pointer
Map of sequence length to RadixTree
char_counter_pointer
Character count data for the purpose of validating input
new()
Create a new RadixForest object
RadixForest$new(sequences = NULL)
sequences
A character vector of sequences to insert into the forest
show()
Print the forest to screen
RadixForest$show()
to_string()
Print the forest to a string
RadixForest$to_string()
graph()
Plot of the forest using igraph
RadixForest$graph(depth = -1, root_label = "root", plot = TRUE)
depth
The tree depth to plot for each tree in the forest.
root_label
The label of the root node(s) in the plot.
plot
Whether to create a plot or return the data used to generate the plot.
A data frame of parent-child relationships used to generate the igraph plot OR a ggplot2 object
to_vector()
Output all sequences held by the forest as a character vector
RadixForest$to_vector()
A character vector of all sequences contained in the forest.
size()
Output the size of the forest (i.e. how many sequences are contained)
RadixForest$size()
The size of the forest
insert()
Insert new sequences into the forest
RadixForest$insert(sequences)
sequences
A character vector of sequences to insert into the forest
A logical vector indicating whether the sequence was inserted (TRUE) or already existing in the forest (FALSE)
erase()
Erase sequences from the forest
RadixForest$erase(sequences)
sequences
A character vector of sequences to erase from the forest
A logical vector indicating whether the sequence was erased (TRUE) or not found in the forest (FALSE)
find()
Find sequences in the forest
RadixForest$find(query)
query
A character vector of sequences to find in the forest
A logical vector indicating whether the sequence was found (TRUE) or not found in the forest (FALSE)
prefix_search()
Search for sequences in the forest that start with a specified prefix. E.g.: a query of "CAR" will find "CART", "CARBON", "CARROT", etc. but not "CATS".
RadixForest$prefix_search(query)
query
A character vector of sequences to search for in the forest
A data frame of all matches with columns "query" and "target".
search()
Search for sequences in the forest that are with a specified distance metric to a specified query.
RadixForest$search( query, max_distance = NULL, max_fraction = NULL, mode = "levenshtein", nthreads = 1, show_progress = FALSE )
query
A character vector of query sequences.
max_distance
how far to search in units of absolute distance. Can be a single value or a vector. Mutually exclusive with max_fraction.
max_fraction
how far to search in units of relative distance to each query sequence length. Can be a single value or a vector. Mutually exclusive with max_distance.
mode
The distance metric to use. One of hamming (hm), global (gb) or anchored (an).
nthreads
The number of threads to use for parallel computation.
show_progress
Whether to show a progress bar.
The output is a data.frame of all matches with columns "query" and "target".
validate()
Validate the forest
RadixForest$validate()
A logical indicating whether the forest is valid (TRUE) or not (FALSE). This is mostly an internal function for debugging purposes and should always return TRUE.
forest <- RadixForest$new()
forest$insert(c("ACGT", "AAAA"))
forest$erase("AAAA")
forest$search("ACG", max_distance = 1, mode = "levenshtein")
# query target distance
# 1 ACG ACGT 1
forest$search("ACG", max_distance = 1, mode = "hamming")
# query target distance
# <0 rows> (or 0-length row.names)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.