pass_align: Transfer alignment from one string to another
In qlcData: Processing Data for Quantitative Language Comparison (QLC)

Description Usage Arguments Details Value Note Author(s) Examples

In the alignment of linguistic strings, it is often better to perform the alignment on a simplified string. This function allows to pass back the alignment from the simplified string to the original

1	pass_align(originals, alignment, sep = " ", in.gap = "-", out.gap = "-")

`originals`	Vector of strings in the original form, with separators
`alignment`	Vector of simplified strings after alignment, with separators and gaps. The number of non-gap parts should match the number of parts of the originals
`sep`	Symbol used as separator between parts of the strings
`in.gap`	Symbol used as gap indicator in the alignments
`out.gap`	Symbol used as gap indicator in the output. This is useful when the gap symbol from the alignments occurs as character in the originals .

Given some strings, a sound (or graphemic) alignment inserts gaps into the strings in such a way as to align the columns between different strings. We assume here an original string that is separated by sep into parts (segments, sounds, tailored grapheme clusters). After simplification (e.g. through tokenize) and alignment (currently using non-R software) a string is retuned with extra gaps inserted. The number of non-gap parts should match the original string.

Vector of original strings with the gaps inserted from the aligned strings.

There is a bash-executable distributed with this package (based on the docopt package) that let you use this function directly in a bash-terminal. The easiest way to use this executable is to softlink the executable to some directory in your bash PATH, for example /usr/local/bin. To softlink the function pass_align to this directory, use something like the following in your bash terminal:

ln -is `Rscript -e 'cat(file.path(find.package("qlcData"), "exec", "pass_align"))'` /usr/local/bin

Michael Cysouw <cysouw@mac.com>

# make some strings with separators
l <- list(letters[1:3], letters[4:7], letters[10:15])
originals <- sapply(l, paste, collapse = " ")
cbind(originals)

# make some alignment
# note that this alignment is non-sensical!
alignment <- c("X - - - X - X", "X X - - - X X", "X X X - X X X")
cbind(alignment)

# match originals to the alignment
transferred <- pass_align(originals, alignment)
cbind(transferred)

# ========

# a slighly more interesting example
# using the bare-bones pairwise alignment from adist()
originals <- c("cute kitten class","utter tentacles")
cbind(originals)

# adist returns strings of pairwise Levenshtein operations
# "I" signals insertion
(levenshtein <- attr(adist(originals, counts = TRUE), "trafos"))

# pass alignments to original strings, show the insertions as "-" gaps
alignment <- c(levenshtein[1,2], levenshtein[2,1])
transferred <- pass_align(originals, alignment, 
    sep = "", in.gap = "I", out.gap = "-")
cbind(transferred)