save_and_scrapeCDC: Global wrapper function for generating, saving and scraping...

Description Usage Arguments Value Examples

View source: R/save_and_scrapeCDC.R

Description

Function wraps the 'buildCDClinks()', 'save_codes()', and scrape functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
save_and_scrapeCDC(
  and_terms = "",
  not_terms = "",
  exact_phrase = "",
  or_terms = "",
  date_from = "",
  date_to = "",
  pages = 1,
  start_page = 1,
  language = "",
  browser = "firefox"
)

Arguments

and_terms

Vector of alphanumeric terms searched using the AND Boolean operator, specified by the CDC as 'with all of the words'.

not_terms

Vector of alphanumeric terms searched using the NOT Boolean operator, specified by the CDC as 'without the words'.

exact_phrase

Vector of alphanumeric terms enclosed in inverted commas and searched as phrases (e.g. "large cat"), specified by the CDC as 'with the exact phrase'.

or_terms

Vector of alphanumeric terms searched using the OR Boolean operator, specified by the CDC as 'with at least one of the words'.

date_from

The date (e.g. 01/01/2000) from which searching will be performed (inclusive). If no value provided, all dates are searched.

date_to

The date (e.g. 01/10/2020) to which searching will be performed (inclusive). If no value provided, all dates are searched.

pages

Integer for the number of pages of search results to be returned (one link per page). A maximum of 100 pages can be displayed in the CDC. The default value is 1.

start_page

Integer specifying which page(s) of search results should be displayed. If multiple pages are selected, multiple URLs are returned, one for each page of ten search results.

language

Two-letter language code for search language. The default is 'en' (English).

browser

The name of the browser used for scraping. Default is 'firefox'.

Value

A dataframe containing all extractable information from all html files in the working directory.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
and_terms <- c('disease', 'spread')
not_terms <- c('animal', 'fish')
exact_phrase <- c('corona virus')
or_terms <- c('pandemic', 'global')
date_from <- '01/01/1980'
date_to <- '10/11/2020'
info <- save_and_scrapeCDC(and_terms = and_terms,
    not_terms = not_terms,
    exact_phrase = exact_phrase,
    or_terms = or_terms,
    date_from = date_from,
    date_to = date_to,
    pages = 3)
head(info);

nealhaddaway/CDCscraper documentation built on Oct. 21, 2020, 5:20 a.m.