save_and_scrapeGS: Global wrapper function for generating, saving and scraping...

View source: R/save_and_scrapeGS.R

save_and_scrapeGSR Documentation

Global wrapper function for generating, saving and scraping info

Description

Function wraps the 'buildGSlinks()', 'save_htmls()', and scrape functions.

Usage

save_and_scrapeGS(
  and_terms = "",
  exact_phrase = "",
  or_terms = "",
  not_terms = "",
  language = "en",
  year_from = "",
  year_to = "",
  start_page = 1,
  pages = 1,
  incl_cit = TRUE,
  incl_pat = TRUE,
  titlesearch = FALSE,
  authors = "",
  source = "",
  pause = 4,
  backoff = FALSE
)

Arguments

and_terms

Vector of alphanumeric terms searched using the AND Boolean operator, specified by Google Scholar as 'with all of the words'.

exact_phrase

Vector of alphanumeric terms enclosed in inverted commas and searched as phrases (e.g. "large cat"), specified by Google Scholar as 'with the exact phrase'.

or_terms

Vector of alphanumeric terms searched using the OR Boolean operator, specified by Google Scholar as 'with at least one of the words'.

not_terms

Vector of alphanumeric terms searched using the NOT Boolean operator, specified by Google Scholar as 'without the words'.

language

Two-letter language code for search language. The default is 'en' (English).

year_from

Integer full numeric year (e.g. 2000) from which searching will be performed (inclusive). If no value provided, all years are searched.

year_to

Integer full numeric year (e.g. 2020) to which searching will be performed (inclusive). If no value provided, all years are searched.

start_page

Integer specifying which page(s) of search results should be displayed. If multiple pages are selected, multiple URLs are returned, one for each page of ten search results. The default is set to generate a list of 100 URLs (maximum set of Google Scholar results visible).

pages

Integer for the number of pages of search results to be returned (one link per page). A maximum of 100 pages can be displayed in Google Scholar. The default value is 1.

incl_cit

Logical argument (TRUE or FALSE) specifying whether citations should be included in the search

incl_pat

Logical argument (TRUE or FALSE) specifying whether patents should be included in the search

titlesearch

Logical argument (TRUE or FALSE) specifying whether the search should be performed on article titles only or anywhere in the record. The default is FALSE.

authors

The names of authors searched for.

source

The name of the source of the articles (e.g. academic journal).

pause

Integer specifying the number of seconds to wait between download attempts. The default value is 4 seconds.

backoff

A logical argument (TRUE or FALSE) specifying whether responsive backing-off should be used. If set to TRUE, the time between calls is varied depending on how long the server takes to respond to the original request. The responsive back-off time is set to multiple the response time by the 'pause' time: i.e. if the system takes 1.02 seconds to respond and 'pause' time is set to 4 seconds, a 4.10 second delay will be employed before the next call. The default for back-off is 'FALSE'.

Value

A list containing: 1) (data) a data frame containing all information that can be extracted from all html files in the working directory; and, 2) (report) a report of the links generated and the input variables used.

Examples

## Not run: 
and_terms <- c('river', 'aquatic')
exact_phrase <- c('water chemistry')
or_terms <- c('crayfish', 'fish')
not_terms <- c('lobster', 'coral')
year_from <- 1900
year_to <- 2020
info <- save_and_scrapeGS(and_terms, exact_phrase, or_terms, not_terms, pages = 3)
info

## End(Not run)

nealhaddaway/GSscraper documentation built on May 6, 2022, 10:52 a.m.