data_parlspeech_magyarlanc: Part of Speech data created with Magyarlanc
In aakosm/HunMineR: Companion package to the Hungarian text mining textbook

data_parlspeech_magyarlanc

R Documentation

Part of Speech data created with Magyarlanc

Description

The dataset is the result of a part of speech analysis conducted with the Magyarlanc tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (https://tankonyv.poltextlab.com/nlp-ch.html).

Usage

data_parlspeech_magyarlanc

Format

It is a data.frame, with 17 870 observation, 4 variables:

token: The token created by magyarlanc.
lemma: The lemma created from the tokens by magyarlanc
POS_tag: The part of speech tag indicating the position of the token in the text.
morfologic_features: The morfologic features of the tokens

Source

https://cap.tk.hu/en/dataoverview

References

Zsibrita, János, Veronika Vincze, and Richárd Farkas (2013). Magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP, 2013: 763–71.

aakosm/HunMineR documentation built on Sept. 27, 2024, 5:22 p.m.