percentmatch1: Function to calculate the percentage of matching between two...
In quickcode: Quick and Essential 'R' Tricks for Better Scripts

percent_match

R Documentation

Function to calculate the percentage of matching between two strings

Description

Function to calculate the percentage of matching between two strings

Usage

percent_match(
  string1,
  string2,
  case_sensitive = FALSE,
  ignore_whitespace = TRUE,
  frag_size = 2
)

string1 %match% string2

sound_match(string1, string2)

Arguments

`string1`	first string
`string2`	second string
`case_sensitive`	if to check case sensitivity
`ignore_whitespace`	if to ignore whitespace
`frag_size`	fragment size of string

Details

Case Sensitivity:

The function can optionally consider or ignore case sensitivity based on the case_sensitive argument.

Whitespace Handling:

With ignore_whitespace set to TRUE, the function removes all whitespaces before comparison. This can be useful for matching strings that may have inconsistent spacing.

Exact Character-by-Character Matching:

The function computes the percentage of matching characters in the same positions.

Substring Matching:

The function checks if one string is a substring of the other, awarding a full match if true.

Levenshtein Distance:

The function uses Levenshtein distance to calculate the similarity and integrates this into the overall match percentage.

Fragment Matching:

- A frag_size argument is introduced that compares fragments (substrings) of a given size (default is 3) from both strings.
- The function creates unique fragments from each string and compares them to find common fragments.
- The percentage match is calculated based on the ratio of common fragments to the total number of unique fragments.

Combining Metrics:

The overall match percentage is computed as the average of exact match, substring match, Levenshtein match, and fragment match percentages.

Value

numeric value of the match percent

match word sounds

Examples

# Example 1: simple match
string1 <- "Hello World"
string2 <- "helo world"

match_percent <- percent_match(string1, string2)
message("Percentage of matching: ", match_percent)


# Example 2: which date is closest
string0 <- "october 12,1898"
string1 <- "2018-10-12"
string2 <- "1898-10-12"
percent_match(string0, string1)
percent_match(string0, string2)
percent_match(string0, string2, frag_size = 4)
percent_match(string1, string2)

sound_match("Robert","rupert")
sound_match("rupert","Rubin")
sound_match("book","oops")

quickcode documentation built on April 11, 2025, 5:49 p.m.