View source: R/save_and_scrapeGS.R
save_and_scrapeGS | R Documentation |
Function wraps the 'buildGSlinks()', 'save_htmls()', and scrape functions.
save_and_scrapeGS( and_terms = "", exact_phrase = "", or_terms = "", not_terms = "", language = "en", year_from = "", year_to = "", start_page = 1, pages = 1, incl_cit = TRUE, incl_pat = TRUE, titlesearch = FALSE, authors = "", source = "", pause = 4, backoff = FALSE )
and_terms |
Vector of alphanumeric terms searched using the AND Boolean operator, specified by Google Scholar as 'with all of the words'. |
exact_phrase |
Vector of alphanumeric terms enclosed in inverted commas and searched as phrases (e.g. "large cat"), specified by Google Scholar as 'with the exact phrase'. |
or_terms |
Vector of alphanumeric terms searched using the OR Boolean operator, specified by Google Scholar as 'with at least one of the words'. |
not_terms |
Vector of alphanumeric terms searched using the NOT Boolean operator, specified by Google Scholar as 'without the words'. |
language |
Two-letter language code for search language. The default is 'en' (English). |
year_from |
Integer full numeric year (e.g. 2000) from which searching will be performed (inclusive). If no value provided, all years are searched. |
year_to |
Integer full numeric year (e.g. 2020) to which searching will be performed (inclusive). If no value provided, all years are searched. |
start_page |
Integer specifying which page(s) of search results should be displayed. If multiple pages are selected, multiple URLs are returned, one for each page of ten search results. The default is set to generate a list of 100 URLs (maximum set of Google Scholar results visible). |
pages |
Integer for the number of pages of search results to be returned (one link per page). A maximum of 100 pages can be displayed in Google Scholar. The default value is 1. |
incl_cit |
Logical argument (TRUE or FALSE) specifying whether citations should be included in the search |
incl_pat |
Logical argument (TRUE or FALSE) specifying whether patents should be included in the search |
titlesearch |
Logical argument (TRUE or FALSE) specifying whether the search should be performed on article titles only or anywhere in the record. The default is FALSE. |
authors |
The names of authors searched for. |
source |
The name of the source of the articles (e.g. academic journal). |
pause |
Integer specifying the number of seconds to wait between download attempts. The default value is 4 seconds. |
backoff |
A logical argument (TRUE or FALSE) specifying whether responsive backing-off should be used. If set to TRUE, the time between calls is varied depending on how long the server takes to respond to the original request. The responsive back-off time is set to multiple the response time by the 'pause' time: i.e. if the system takes 1.02 seconds to respond and 'pause' time is set to 4 seconds, a 4.10 second delay will be employed before the next call. The default for back-off is 'FALSE'. |
A list containing: 1) (data) a data frame containing all information that can be extracted from all html files in the working directory; and, 2) (report) a report of the links generated and the input variables used.
## Not run: and_terms <- c('river', 'aquatic') exact_phrase <- c('water chemistry') or_terms <- c('crayfish', 'fish') not_terms <- c('lobster', 'coral') year_from <- 1900 year_to <- 2020 info <- save_and_scrapeGS(and_terms, exact_phrase, or_terms, not_terms, pages = 3) info ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.