kpv: Public Domain Komi-Zyrian data

Description Usage Format Source

Description

A dataset containing sentences from Public Domain sources. Digitalized in Fenno-Ugrica and proofread in FU-Lab. Ivan Belykh's works made available by the author. Four Battles book proofread by Niko Partanen. Current version contains only data comparable with other languages, also the idea is to have all possible Komi-Zyrian data here and the sentence_id would tell which are matching.

Usage

1

Format

Data frame with Komi-Zyrian data

doc_id

name of the text in original corpus

sentence_id

sentence id, unique within a text

sentence

sentence text

...

Source

https://github.com/langdoc/kpv-lit

https://fennougrica.kansalliskirjasto.fi/

http://komikyv.org/


langdoc/uralic documentation built on May 29, 2019, 3:41 a.m.