kremlin_en: A dataset including all contents published on the...

Description Usage Format Source

Description

A dataset with 24 338 textual items.

Usage

1

Format

A data frame with 24338 rows and 8 columns:

doc_id

the id is a composed string, that should make the identifier unique even when used together with other similarly shaped datasets. Elements are separated by a an hyphen-minus. A an example doc_id would be president_ru-en-012345.

text

this includes the full text of the document, including the title and the textual string with date and location (when present).

date

date of publication in the date format.

title

the title of the document

location

the location from where the document was issued as reported at the beginning of each post, e.g. "Novo-Ogaryovo, Moscow Region"; if not given, an empty string.

link

a URL, source of the document

id

numeric id; includes only the numeric part of doc_id, may be useful if only a numeric identifier is needed.

term

a character string referring to the presidential term. The period after Yeltsin's resignation, but before Putin's first inauguration in May 2000 is indicated as "Putin 0", the following as "Putin 1", "Putin 2", "Medvedev 1", "Putin 3", and "Putin 4"

Source

http://en.kremlin.ru/


giocomai/tifkremlinen documentation built on Dec. 20, 2021, 10:49 a.m.