Description Usage Arguments Details Value See Also Examples
Sift the dataset for word pairs/triples/... such that the first word in the first languages contains the first sequence, the one in the second language the second sequence, and so on.
1 2 3 4 5 6 7 8 9 10 | findExamples(
data,
...,
distance.start,
distance.end,
na.value,
zeros,
cols,
perl
)
|
data |
[soundcorrs] The dataset in which to look. |
... |
[character] Sequences for which to look. May be regular expressions as defined in R, or in the |
distance.start |
[integer] The allowed distance between segments where the sound sequences begin. A negative value means alignment of the beginning of sequences will not be checked. Defaults to -1. |
distance.end |
[integer] The allowed distance between segments where the sound sequences end. A negative value means alignment of the end of sequences will not be checked. Defaults to -1. |
na.value |
[numeric] Treat |
zeros |
[logical] Take linguistic zeros into account? Defaults to |
cols |
[character vector] Which columns of the dataset to return as the result. Can be a vector of names, |
perl |
[logical] Use Perl-compatible regular expressions? Defaults to |
One of the more time-consuming tasks, when working with sound correspondences, is looking for specific examples which realize the given correspondence. findExamples
can fully automate this process. It has several arguments that can help fine-tune the search, of which perhaps the most important are distance.start
and distance.end
. It should be noted that their default values (-1
for both) mean that findExamples
will find every such pair/triple/... of words, that the first word contains the first query, the second word the second query, etc. – regardless of whether these segments do in fact correspond to each other in the alignment. This is intentional, and stems from the assumption that in this case, false positives are generally less harmful, and most of all easier to spot than false negatives.
findExamples
accepts regular expressions in queries, both such as are available in pure R, and such as have been defined in the transcription
, in both notations accepted by expandMeta
. It is highly recommended that the user acquaints him or herself with the concept, as it is in it that the true power of findExamples
lies.
[df.findExamples] A list with two fields: $data
, a data frame with found examples; and $which
, a logical vector showing which rows of data
are considered matches.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # In the examples below, non-ASCII characters had to be escaped for technical reasons.
# In the actual usage, Unicode is supported under BSD, Linux, and macOS.
# prepare sample dataset
dataset <- loadSampleDataset ("data-capitals")
# find examples which have "a" in all three languages
findExamples (dataset, "a", "a", "a")
# find examples where German has schwa, and Polish and Spanish have a Vr sequence
findExamples (dataset, "\u0259", "Vr", "Vr")
# as above, but the schwa and the two vowels must be in the same segment
findExamples (dataset, "\u0259", "V(?=r)", "V(?=r)", distance.start=0, distance.end=0, perl=TRUE)
# find examples where German has a-umlaut, Polish has a or e, and Spanish has any sound at all
findExamples (dataset, "\u00E4", "[ae]", "")
# find examples where German has a linguistic zero while Polish and Spanish do not
findExamples (dataset, "-", "[^-]", "[^-]", zeros=TRUE)
# find examples where German has schwa, and Polish and Spanish have a
findExamples (dataset, "\u0259", "a", "a", distance.start=-1, distance.end=-1)
# as above, but the schwa and the two a's must be in the same segment
findExamples (dataset, "\u0259", "a", "a", distance.start=0, distance.end=0)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.