read.pangloss: Read a file in the format used in the pangloss collection

Description Usage Arguments Value References Examples

Description

The pangloss collection (http://lacito.vjf.cnrs.fr/pangloss/index_en.html) is a large collection of interlinearized texts.

Usage

1
2
read.pangloss(url, DOI = NULL, get.texts = TRUE, get.sentences = TRUE,
  get.words = TRUE, get.morphemes = TRUE)

Arguments

url

a length one character vector with the url of the document to be imported

DOI

an unique identifier

get.texts

should the 'texts' data.frame be included in the result ?

get.sentences

should the 'sentences' data.frame be included in the result ?

get.words

should the 'words' data.frame be included in the result ?

get.morphemes

should the 'morphemes' data.frame be included in the result ?

Value

a list with up to 5 slots corresponding to different units and named "texts", "sentences", "words", "morphemes". Each slot contains a data frame where each line describe an occurrence of the corresponding unit.

References

http://lacito.vjf.cnrs.fr/pangloss/index_en.html

Examples

1
2
3
path <- system.file("exampleData", "FOURMI.xml", package="interlineaR")
corpus <- read.pangloss(path)
head(corpus$morphemes)

Example output

Loading required package: xml2
Loading required package: reshape2
  morphem_id text_id sentence_id word_id     token         gloss
1          1       1    FOURMIs1       1         à             1
2          2       1    FOURMIs1       2       bꜛé P2+être+Loc+9
3          3       1    FOURMIs1       3       ɲꜜê    corps+AL+3
4          4       1    FOURMIs1       4     tʃúʔú          nuit
5          5       1    FOURMIs1       5 wù-ʃíʔìnɨ́       3-belle
6          6       1    FOURMIs1       6    wû-tsé    3-certaine

interlineaR documentation built on May 1, 2019, 7:29 p.m.