udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <http://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>.

Package details

AuthorJan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
MaintainerJan Wijffels <[email protected]>
LicenseMPL-2.0
Version0.8.3
URL https://bnosac.github.io/udpipe/en/index.html https://github.com/bnosac/udpipe
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("udpipe")

Try the udpipe package in your browser

Any scripts or data that you put into this service are public.

udpipe documentation built on July 6, 2019, 1:03 a.m.