View source: R/lsh_properties.R
jaccard_probability | R Documentation |
This is a port of the
lsh_probability
function from the
textreuse
package, with arguments changed to reflect the hyperparameters in this
package. It gives the probability that two strings of jaccard similarity
similarity
will be matched, given the chosen bandwidth and number of
bands.
jaccard_probability(similarity, n_bands, band_width)
similarity |
the similarity of the two strings you want to compare |
n_bands |
The number of LSH bands used in hashing. |
band_width |
The number of hashes in each band. |
a decimal number giving the probability that the two items will be returned as a candidate pair from the minhash algorithm.
# Find the probability two pairs will be matched given they have a
# jaccard_similarity of .8, band width of 5, and 50 bands:
jaccard_probability(.8, n_bands = 50, band_width = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.