ECB_press_conferences_tokens: Tokenized press conferences

ECB_press_conferences_tokensR Documentation

Tokenized press conferences

Description

The pre-processed and tokenized version of the ECB_press_conferences corpus of press conferences. The processing involved the following steps:

  • Subset paragraphs shorter than 10 words

  • Removal of stop words

  • Part-of-speech tagging, following which only nouns, proper nouns and adjective were retained.

  • Detection and merging of frequent compound words

  • Frequency-based cleaning of rare and very common words

Usage

ECB_press_conferences_tokens

Format

A quanteda::tokens object.

Source

https://www.ecb.europa.eu/press/key/date/html/index.en.html.

See Also

ECB_press_conferences

Examples

LDA(ECB_press_conferences_tokens)


sentopics documentation built on Sept. 20, 2024, 5:06 p.m.