knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

mediacloud

R wrapper package for the MediaCloud API.

Installation

You can install the development version of mediacloud from GitHub with:

#install.packages("remotes")
remotes::install_github("joon-e/mediacloud")

Usage

library(mediacloud)

Authentication

Register for a MediaCloud account here. The API key can be passed directly to functions with the key argument. If no key is provided, then the package will look for one in the environment variable MEDIACLOUD_API_KEY. Thus, the easiest way to authenticate is to store your key using Sys.setenv():

Sys.setenv(MEDIACLOUD_API_KEY = "YOUR_KEY_GOES_HERE")

Search media

Search for media outlets with search_media():

search_media(tag = "Germany___National", n = 10)

This is mainly useful for matching media outlets with their MediaCloud media_id.

Search stories

Search for stories with search_stories():

stories <- search_stories(title = "dogecoin", media_id = c(19831, 38697), after_date = "2021-05-01")
stories

The function provides a simplified interface for writing the Solr queries that MediaCloud parses to search for stories (q and fq parameters in the API call). This includes the following optional arguments:

Use the argument n to control the maximum number of results returned with one call (<= 1000). Note that the returned object also includes the processed_stories_id, which can be passed to the argument last_processed_stories_id to paginate over results.

Get word matrices

Get Tidytext-style word matrices associated with those stories with get_word_matrices(). This uses the same arguments as search_stories(), but is most useful to obtain word matrices for stories found with search_stories():

wm <- get_word_matrices(stories_id = stories$stories_id)
wm

The word matrices can be tranformed to Quanteda-style DFMs using tidytext::cast_dfm():

tidytext::cast_dfm(wm, stories_id, word_stem, word_counts)


joon-e/mediacloud documentation built on Jan. 8, 2022, 12:04 a.m.