lev_token_sort_ratio: Ordered token matching

View source: R/lev-distance.R

lev_token_sort_ratioR Documentation

Ordered token matching

Description

Compares strings by tokenising them, sorting the tokens alphabetically and then computing the lev_ratio() of the result. This means that the order of words is irrelevant which can be helpful in some circumstances.

Usage

lev_token_sort_ratio(a, b, pairwise = TRUE, useNames = TRUE, ...)

Arguments

a, b

The input strings

pairwise

Boolean. If TRUE, only the pairwise distances between a and b will be computed, rather than the combinations of all elements.

useNames

Boolean. Use input vectors as row and column names?

...

Additional arguments to be passed to stringdist::stringdistmatrix() or stringdist::stringsimmatrix().

Value

A numeric scalar, vector or matrix depending on the length of the inputs.

See Also

lev_token_set_ratio()

Examples

x <- "Episode IV - Star Wars: A New Hope"
y <- "Star Wars Episode IV - New Hope"

# Because the order of words is different the simple approach gives a low match ratio.
lev_ratio(x, y)

# The sorted token approach ignores word order.
lev_token_sort_ratio(x, y)

levitate documentation built on Oct. 1, 2023, 1:08 a.m.