CreateBF: Bloom Filter Encoding

View source: R/RcppExports.R

CreateBFR Documentation

Bloom Filter Encoding

Description

Creates Bloom filters for each row of the input data by splitting the input into q-grams which are hashed into a bit vector.

Usage

CreateBF(ID, data, password, k = 20, padding = 1, qgram = 2, lenBloom = 1000)

Arguments

ID

a character vector or integer vector containing the IDs of the data.frame.

data

a character vector containing the data to be encoded. Make sure the input vectors are not factors.

password

a string used as a password for the random hashing of the q-grams.

k

number of bit positions set to one for each bigram.

padding

integer (0 or 1) indicating if string padding is to be used.

qgram

integer (1 or 2) indicating whether to split the input strings into bigrams (q = 2) or unigrams (q = 1).

lenBloom

desired length of the Bloom filter in bits.

Value

A data.frame containing IDs and the corresponding bit vector.

Source

Schnell, R., Bachteler, T., Reiher, J. (2009): Privacy-preserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making 9: 41.

See Also

CreateCLK, StandardizeString

Examples

# Load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# Encode data
BF <- CreateBF(ID = testData$V1, data = testData$V7,
  k = 20, padding = 1, q = 2, l = 1000,
  password = "(H]$6Uh*-Z204q")


PPRL documentation built on Nov. 10, 2022, 5:41 p.m.

Related to CreateBF in PPRL...