pass_align: Transfer alignment from one string to another

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/pass_align.R

Description

In the alignment of linguistic strings, it is often better to perform the alignment on a simplified string. This function allows to pass back the alignment from the simplified string to the original

Usage

1
pass_align(originals, alignment, sep = " ", in.gap = "-", out.gap = "-")

Arguments

originals

Vector of strings in the original form, with separators

alignment

Vector of simplified strings after alignment, with separators and gaps. The number of non-gap parts should match the number of parts of the originals

sep

Symbol used as separator between parts of the strings

in.gap

Symbol used as gap indicator in the alignments

out.gap

Symbol used as gap indicator in the output. This is useful when the gap symbol from the alignments occurs as character in the originals .

Details

Given some strings, a sound (or graphemic) alignment inserts gaps into the strings in such a way as to align the columns between different strings. We assume here an original string that is separated by sep into parts (segments, sounds, tailored grapheme clusters). After simplification (e.g. through tokenize) and alignment (currently using non-R software) a string is retuned with extra gaps inserted. The number of non-gap parts should match the original string.

Value

Vector of original strings with the gaps inserted from the aligned strings.

Note

There is a bash-executable distributed with this package (based on the docopt package) that let you use this function directly in a bash-terminal. The easiest way to use this executable is to softlink the executable to some directory in your bash PATH, for example /usr/local/bin. To softlink the function pass_align to this directory, use something like the following in your bash terminal:

ln -is `Rscript -e 'cat(file.path(find.package("qlcData"), "exec", "pass_align"))'` /usr/local/bin

Author(s)

Michael Cysouw <cysouw@mac.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# make some strings with separators
l <- list(letters[1:3], letters[4:7], letters[10:15])
originals <- sapply(l, paste, collapse = " ")
cbind(originals)

# make some alignment
# note that this alignment is non-sensical!
alignment <- c("X - - - X - X", "X X - - - X X", "X X X - X X X")
cbind(alignment)

# match originals to the alignment
transferred <- pass_align(originals, alignment)
cbind(transferred)

# ========

# a slighly more interesting example
# using the bare-bones pairwise alignment from adist()
originals <- c("cute kitten class","utter tentacles")
cbind(originals)

# adist returns strings of pairwise Levenshtein operations
# "I" signals insertion
(levenshtein <- attr(adist(originals, counts = TRUE), "trafos"))

# pass alignments to original strings, show the insertions as "-" gaps
alignment <- c(levenshtein[1,2], levenshtein[2,1])
transferred <- pass_align(originals, alignment, 
    sep = "", in.gap = "I", out.gap = "-")
cbind(transferred)

qlcData documentation built on May 2, 2019, 8:29 a.m.