Description Usage Arguments Details Value See Also Examples
Sift the dataset for word pairs such that the first word contains x
and the second word contains y
in the corresponding segment or segments.
1 |
data |
[soundcorrs] The dataset in which to look. Only datasets with two languages are supported. |
x |
[character] The sequence to find in language1. May be a regular expression. If an empty string, anything will be considered a match. |
y |
[character] The sequence to find in language2. May be a regular expression. If an empty string, anything will be considered a match. |
exact |
[numeric] If 0 or |
cols |
[character vector] Which columns of the dataset to return as the result. Can be a vector of names, |
Probably the most common usage of findExamples
is with datasets containing pairs of words. This function is a simple wrapper around findExamples
which hopes to facilitate its use in this most common case. Instead of the five arguments that findExamples
requires, this function only takes two. It is, of course, at the cost of control but should a more fine-tuned search be required, findExamples
can always still be used instead of findPairs
.
The default is the inexact mode (exact
set to 0
or FALSE
). It corresponds to distance.start
and distance.end
being both set to -1
, na.value
being set to 0
, and zeros
being set to FALSE
, which are also the default settings in findExamples()
. The risk here are false positives. In my experience, however, those are rare, and because they are displayed, the user has a chance to spot them.
The opposite is the exact mode (exact
set to 1 or TRUE
), which corresponds to distance.start
and distance.end
being both set to 0
, na.value
being set to -1
, and zeros
to TRUE
. The risk are false negatives, in my experience both much more common than false positives in the inexact mode, and effectively impossible to spot as they are simply not displayed.
A middle ground is the semi-exact mode (exact
set to 0.5), where distance.start
and distance.end
are both set to 1
, na.value
is set to 0
, and zeros
to FALSE
. It decreases the risk of false positives while increasing only a little the risk of false negatives.
[df.findExamples] A subset of the dataset, containing only the pairs with corresponding sequences. Warning: pairs with multiple occurrences of such sequences are only included once.
1 2 3 4 5 6 7 8 9 10 | # In the examples below, non-ASCII characters had to be escaped for technical reasons.
# In the actual usage, Unicode is supported under BSD, Linux, and macOS.
# prepare sample dataset
dataset <- loadSampleDataset ("data-ie")
# run findPairs
findPairs (dataset, "a", "a")
findPairs (dataset, "e", "f", exact=0)
findPairs (dataset, "e", "f", exact=0.5)
findPairs (dataset, "e", "f", exact=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.