findPairs: A convenience wrapper around 'findExamples'.
In soundcorrs: Semi-Automatic Analysis of Sound Correspondences

Description Usage Arguments Details Value See Also Examples

Sift the dataset for word pairs such that the first word contains x and the second word contains y in the corresponding segment or segments.

1	findPairs(data, x, y, exact, cols)

`data`	[soundcorrs] The dataset in which to look. Only datasets with two languages are supported.
`x`	[character] The sequence to find in language1. May be a regular expression. If an empty string, anything will be considered a match.
`y`	[character] The sequence to find in language2. May be a regular expression. If an empty string, anything will be considered a match.
`exact`	[numeric] If 0 or `FALSE`, `distance.start`=`distance.end`=-1, `na.value`=0, and `zeros`=`FALSE`. If 0.5, `distance.start`=`distance.end`=1, `na.value`=0, and `zeros`=`FALSE`. If 1 or `TRUE`, `distance.start`=`distance.end`=0, `na.value`=-1, and `zeros`=`TRUE`. Defaults to 0.
`cols`	[character vector] Which columns of the dataset to return as the result. Can be a vector of names, `"aligned"` (the two columns with segmented, aligned words), or `"all"` (all columns). Defaults to `"aligned"`.

Probably the most common usage of findExamples is with datasets containing pairs of words. This function is a simple wrapper around findExamples which hopes to facilitate its use in this most common case. Instead of the five arguments that findExamples requires, this function only takes two. It is, of course, at the cost of control but should a more fine-tuned search be required, findExamples can always still be used instead of findPairs.

The default is the inexact mode (exact set to 0 or FALSE). It corresponds to distance.start and distance.end being both set to -1, na.value being set to 0, and zeros being set to FALSE, which are also the default settings in findExamples(). The risk here are false positives. In my experience, however, those are rare, and because they are displayed, the user has a chance to spot them.

The opposite is the exact mode (exact set to 1 or TRUE), which corresponds to distance.start and distance.end being both set to 0, na.value being set to -1, and zeros to TRUE. The risk are false negatives, in my experience both much more common than false positives in the inexact mode, and effectively impossible to spot as they are simply not displayed.

A middle ground is the semi-exact mode (exact set to 0.5), where distance.start and distance.end are both set to 1, na.value is set to 0, and zeros to FALSE. It decreases the risk of false positives while increasing only a little the risk of false negatives.

[df.findExamples] A subset of the dataset, containing only the pairs with corresponding sequences. Warning: pairs with multiple occurrences of such sequences are only included once.

findExamples, allPairs

# In the examples below, non-ASCII characters had to be escaped for technical reasons.
# In the actual usage, Unicode is supported under BSD, Linux, and macOS.

# prepare sample dataset
dataset <- loadSampleDataset ("data-ie")
# run findPairs
findPairs (dataset, "a", "a")
findPairs (dataset, "e", "f", exact=0)
findPairs (dataset, "e", "f", exact=0.5)
findPairs (dataset, "e", "f", exact=1)