read_align: Reading different versions of linguistic multialignments.

Description Usage Arguments Details Value Author(s) References

View source: R/read_align.R

Description

Multialignments of strings are a central step for historical linguistics (quite similar to multialignments in bioinformatics). There is no consensus (yet) about the file-structure for multialignments in linguistics. Currently, this functions offers to read various flavours of multialignment, trying to harmonize the internal R-structure.

Usage

1
read_align(file, flavor)

Arguments

file

Multialignment to be read

flavor

Currently two flavours are implemented "PAD" and "BDPA"

Details

The flavor "PAD" refers to the Phonetische Atlas Deutschlands, which provides multialignments for german dialects. The flavor "BDPA" refers to the Benchmark Database for Phonetic Alignments.

Value

Multialignment-files often contain various different kinds of information. An attempt is made to turn the data into a list with the following elements:

meta

: Metadata

align

: The actual alignments as a dataframe. When IDs are present in the original file, they are used as rownames. Some attempt is made to add useful column names.

doculects

: The rows of the alignment normally are some kind of doculects ("languages", "dialects"). However, because these doculects might occur more than once (when two different, but cognate words from a languages are included) these names are not used as rownames of $align, but presented separately here.

annotations

: The columns of a multialignment can have annotations, e.g. metathesis or orthographic standard. These annotations are saved here as a dataframe with the same number of columns as the $align dataframe. The name of the annotation is put in the rownames.

Author(s)

Michael Cysouw <cysouw@mac.com>

References

BDPA is available at http://alignments.lingpy.org. PAD is available at http://github.com/cysouw/PAD/


qlcData documentation built on May 2, 2019, 8:29 a.m.