compare_buckets: Function that creates a similarity graph and divides it into...

Description Usage Arguments Value Examples

View source: R/minhash_v2.R

Description

Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution

Usage

1
compare_buckets(hashed_signatures, max_bucket_size = 1000)

Arguments

hashed_signatures

The hashed signatures

max_bucket_size

The largest block size allowed by user

Value

max_bucket_size The largest bucket size (or block size) that one can handle

Examples

1
2
3
4
5
6
head(data <- RLdata500[-c(2,4)])
minidata <- data[1:2,]
head(all_the_shingles <- apply(minidata,1,shingles,k=8))
head(minhash.minidata <- minhash_v2(all_the_shingles, p=10))
hashed_signature <- hash_signature(minhash.minidata, b=5)
compare_buckets(hashed_signature, max_bucket_size=200)

tlsh documentation built on Nov. 16, 2020, 9:15 a.m.