data_parlspeech_magyarlanc: Part of Speech data created with Magyarlanc

data_parlspeech_magyarlancR Documentation

Part of Speech data created with Magyarlanc

Description

The dataset is the result of a part of speech analysis conducted with the Magyarlanc tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (https://tankonyv.poltextlab.com/nlp-ch.html).

Usage

data_parlspeech_magyarlanc

Format

It is a data.frame, with 17 870 observation, 4 variables:

token

The token created by magyarlanc.

lemma

The lemma created from the tokens by magyarlanc

POS_tag

The part of speech tag indicating the position of the token in the text.

morfologic_features

The morfologic features of the tokens

Source

https://cap.tk.hu/en/dataoverview

References

Zsibrita, János, Veronika Vincze, and Richárd Farkas (2013). Magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP, 2013: 763–71.


aakosm/HunMineR documentation built on Sept. 27, 2024, 5:22 p.m.