PDFs_collect: Attempts to download PDFs from multiple DOI links.

Description Usage Arguments Value See Also Examples

View source: R/PDFs_collect.R

Description

Tries to download a collection of PDF files using multiple digital object identifier (DOI) links. Updates a data frame with the success of these downloads. The function is a wrapper for PDF_download. NOTE: A single DOI may generate multiple PDF files. If running downloader in Windows, having "WindowsProxy = TRUE" will significantly improve download success.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
PDFs_collect(
  aDataFrame,
  DOIcolumn,
  FileNamecolumn,
  directory = getwd(),
  randomize = FALSE,
  seed = NULL,
  buffer = FALSE,
  validatePDF = TRUE,
  quiet = FALSE,
  showSummary = TRUE,
  WindowsProxy = FALSE
)

Arguments

aDataFrame

A data frame containing a column of DOIs and a column of individual file names for each downloaded PDF.

DOIcolumn

The label of the column containing all the DOI links.

FileNamecolumn

The label of the column containing all the strings that will be used to rename the downloaded files.

directory

A string of the location (directory) were downloaded PDF files are to be saved. NOTE: helps to have this directory created before initializing the PDFs_collect function.

randomize

When TRUE will attempt to download PDFs in a random order. This may be necessary to ensure that host websites do not have their HTML and files repeatedly accessed.

seed

An integer used to enforce repeatability when randomly downloading PDFs.

buffer

When TRUE will randomly delay the downloads by a few seconds (with a mean 4 seconds and a range of 1 to 20 seconds). Another strategy to avoid quickly and repeatedly accessing host websites.

validatePDF

When TRUE will only save to files that are valid PDF documents. When FALSE will save all candidate files, even if they are not valid PDF formats.

quiet

When FALSE does not print to console individual download progress and summary.

showSummary

When FALSE does not print overall summary of download successes and failures.

WindowsProxy

When TRUE significantly improves download success for computers running Windows; when FALSE on a Windows based computer, you may only be able to download 30 to 50 PDFs at a time before a connection error occurs and halts all downloads (e.g., InternetOpenUrl failed error).

Value

The data frame with new column containing download-outcome successes.

See Also

PDF_download

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 

data(example_references_metagear)
someRefs <- effort_initialize(example_references_metagear)  
dir.create("metagear_downloads")      
PDFs_collect(aDataFrame = someRefs, DOIcolumn = "DOI", 
             FileNamecolumn = "STUDY_ID", directory = "metagear_downloads",
			WindowsProxy = TRUE)

## End(Not run)

metagear documentation built on Feb. 15, 2021, 5:09 p.m.