download_guardian: Download articles from the Guardian

Description Usage Arguments Value Examples

View source: R/prepare_data.r

Description

This is a wrapper for the get_guardian function from the GuardianR package. Given a query, or character vector of query terms, and a from and to date, it will download the data in batches, and once downloaded return all articles as a data.frame. This makes it possible to collect all the data. If the download is interrupted (e.g., internet issue, leaving work, tornado), it can simply be resumed by running the function again with the same arguments.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
download_guardian(
  query_terms,
  api.key,
  fromdate,
  todate,
  skip = NULL,
  path = getwd(),
  stepsize = 7,
  verbose = T
)

Arguments

query_terms

A character vector with Guardian queries. All terms are concatenated with OR operators (grouped in parentheses).

api.key

An API key for the Guardian API, which can be obtained for free by filling in this webform.

fromdate

The starting date, in format "YYYY-MM-DD". e.g. "2010-01-01"

todate

The ending date, in format "YYYY-MM-DD". e.g. "2015-12-31"

skip

Optionally, a vector of days to skip, as a last resort.

path

The path to a directory in which the download directory is created. Default is current working directory

stepsize

The number of days per batch.

verbose

If TRUE, report progress

Value

A data.frame with guardian articles.

Examples

1
2
3
api.key = "[your own api key]"
query_terms = terrorism_news    ## (terrorism_news is data included in this package)
d = download_guardian(query_terms, api.key, fromdate = '2012-01-01', todate='2012-01-20')

maskedforreview/gtdnews documentation built on April 12, 2021, 11:53 a.m.