data_nol_mno_clean: Népszabadság and Magyar Nemzet front page articles

data_nol_mno_cleanR Documentation

Népszabadság and Magyar Nemzet front page articles

Description

The dataset contains 71 875 front page articles from the print Hungarian dailies, Magyar Nemzet and Népszabadság. This dataset is used in the 12th chapter of the textbook (https://tankonyv.poltextlab.com/oszt%C3%A1lyoz%C3%A1s-%C3%A9s-fel%C3%BCgyelt-tanul%C3%A1s.html).

Usage

data_nol_mno_clean

Format

It is a data.frame, with 71 875 observation, 5 variables:

row_number

A unique document id

filename

The source file names. The syntax: daily_year_month_day_nr.txt

majortopic_code

The Comparative Agendas Project majortopic coding for the article

text

The pre-processed article texts

corpus

Indicating the article source. Either "NOL" for Népszabadság, or "MNO" for "Magyar Nemzet".

Source

https://cap.tk.hu/en/dataoverview

References

Sebők, Miklós, and Zoltán Kacsuk (2021). The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.. Political Analysis, 29(2): 236-249.


aakosm/HunMineR documentation built on Sept. 27, 2024, 5:22 p.m.