mp_corpus: Get documents from the Manifesto Corpus Database

View source: R/manifesto.R

mp_corpusR Documentation

Get documents from the Manifesto Corpus Database

Description

Documents are downloaded from the Manifesto Project Corpus Database. If CMP coding annotations are available, they are attached to the documents, otherwise raw texts are provided. The documents are cached in the working memory to ensure internal consistency, enable offline use and reduce online traffic.

Usage

mp_corpus(
  ids,
  apikey = NULL,
  cache = TRUE,
  codefilter = NULL,
  codefilter_layer = "cmp_code",
  translation = NULL,
  as_tibble = FALSE,
  tibble_metadata = "simplified"
)

mp_corpus_df(
  ids,
  apikey = NULL,
  cache = TRUE,
  codefilter = NULL,
  codefilter_layer = "cmp_code",
  translation = NULL,
  tibble_metadata = "simplified"
)

mp_corpus_df_bilingual(
  ids,
  apikey = NULL,
  cache = TRUE,
  codefilter = NULL,
  codefilter_layer = "cmp_code",
  translation = "en",
  tibble_metadata = "simplified"
)

Arguments

ids

Information on which documents to get. This can either be a list of partys (as ids) and dates of elections as given to mp_metadata or a ManifestoMetadata object (data.frame) as returned by mp_metadata. Alternatively, ids can be a logical expression specifying a subset of the Manifesto Project's main dataset. It will be evaluated within the data.frame returned by mp_maindataset such that all its variables and functions thereof can be used in the expression.

apikey

API key to use. Defaults to NULL, resulting in using the API key set via mp_setapikey.

cache

Boolean flag indicating whether to use locally cached data if available.

codefilter

A vector of CMP codes to filter the documents: only quasi-sentences with the codes specified in codefilter are returned. If NULL, no filtering is applied

codefilter_layer

layer to which the codefilter should apply, defaults to cmp_code

translation

A string containing the two digit ISO code of a translation language that should be used for the text instead of the original document language. Defaults to NULL, resulting in the original language of a document. For documents that are already originally in the requested translation language, it returns the original text. English would be "en".

as_tibble

Boolean flag indicating whether to return a tibble/data.frame object instead of a ManifestoCorpus object, for backward compatibility defaults to FALSE

tibble_metadata

A string specifing the handling of document-level metadata when using 'as_tibble' = TRUE. It can be one of the following values: "none" = no metadata, "simplified" = basic metadata ("manifesto_id", "party", "date", "language", "annotations", "translation_en"), "all" = all metadata, defaults to "simplified"

Details

'mp_corpus_df' is a shorthand for getting the documents of the Manifesto Corpus as a tibble/data.frame object instead of a ManifestoCorpus object. It takes the same parameters as 'mp_corpus' (it is equivalent to mp_corpus(..., as_tibble = TRUE)). See mp_save_cache for ensuring reproducibility by saving cache and version identifier to the hard drive. See mp_update_cache for updating the locally saved content with the most recent version from the Manifesto Project Database API.

'mp_corpus_df_bilingual' is a shorthand for getting the original text and the english translations (or in case further translation languages become available also other translation languages than english) from the Manifesto Corpus as a tibble/data.frame object. The original text ends up in the "text" column and the english translation in "text_en" (or more abstract in case of further translation languages in a column named "text_<two digit ISO language code>"). It accepts the same additional parameters as 'mp_corpus_df'.

Value

an object of Corpus's subclass ManifestoCorpus holding the available of the requested documents

Examples

## Not run: 
corpus <- mp_corpus(party == 61620 & rile > 10)

wanted <- data.frame(party=c(41320, 41320), date=c(200909, 201309))
mp_corpus(wanted)

mp_corpus(subset(mp_maindataset(), countryname == "France"))

partially_available <- data.frame(party=c(41320, 41320), date=c(200909, 200509))
mp_corpus(partially_available)

corpus_df <- mp_corpus(party == 61620 & rile > 10, as_tibble = TRUE)
corpus_df <- mp_corpus_df(party == 61620 & rile > 10)
corpus_df <- mp_corpus_df(party == 61620 & rile > 10, tibble_metadata = "all")

mp_corpus(wanted, translation = "en")
mp_corpus_df(wanted, translation = "en")

mp_corpus_df_bilingual(wanted, translation = "en")

## End(Not run)

manifestoR documentation built on May 29, 2024, 6:02 a.m.