stringdist: stringdist

Description Usage Arguments Details Value Constraints See Also Examples

Description

compute distance metrics between strings.

Usage

1
2
3
stringdist(a, b, method = "osa", useBytes = FALSE, weight = c(d = 1, i =
  1, s = 1, t = 1), maxDist = Inf, q = 1, p = 0,
  nthread = getOption("sd_num_thread"), caseFlag = 0, vlength = 3, ...)

Arguments

a

character or FLVector of characters

b

character or FLVector of characters

method

can be c("lv","dl","hamming","jaccard","jw","nmw") where lv - Levenshtein, dl - Levenshtein-Damerau, jw - Jaro-Winkler, nmw - NeedleManWunsch. Default is "lv"

weight

for method=nmw, weights and penalties for match, mismatch and gaps, integer weights for matching sequential(d), nonmatching non-sequential characters(i) between the strings, and integer penality for gaps(s) (ideally negative).

p

penality factor for jaro-winkler if p==0 jaro distance is computed

caseFlag

logical or 0/1 indicating if comparision should be case sensitive

vlength

optional, length of strings to compare used for hamming

...

Details

This function computes pairwise string distances between elements of a and b, where the argument with less elements is recycled.

The following distance metrics are supported: lv: Levenshtein, calling FLLevenshteinDist; dl: Levenshtein-Damerau, calling FLDLevenshteinDist; hamming: Hamming, calling FLHammingDist; jaccard: Jaccard, calling FLHammingDist; j, p==0: Jaro, calling FLJaroDist; j, p>0: Jaro-Winkler, calling FLJaroWinklerDist; nmw: Needleman-Wunsch, calling FLNeedleManWunschDist.

Value

FLVector if any a or b is R character of length 1. Otherwise returns a FLMatrix.

Constraints

row vectors are not supported currently. Output is slightly different from stringdist::stringdist. Refer to @return section.

See Also

stringdist for R function reference implementation.

Examples

1
2
3
4
5
6
7
widetable  <- FLTable("iris", "rownames")
flv <- widetable[1:10,"Species"]
resultflvector <- stringdist("xyz",flv)
resultflvector <- stringdist("xyz",flv,method="lv",caseFlag=1)
resultflvector <- stringdist("xyz",flv,method="hamming",vlength=4)
resultflvector <- stringdist(flv,flv,method="jw",p=1)
resultflvector <- stringdist(c("xyz","poli"),flv,method="jw")

Fuzzy-Logix/AdapteR documentation built on May 6, 2019, 5:07 p.m.