batch_pubmed_download: Download PubMed Records in XML or TXT Format

Description Usage Arguments Details Author(s) References Examples

View source: R/batch_pubmed_download.R

Description

Performs a PubMed Query (via the get_pubmed_ids() function), downloads the resulting data (via multiple fetch_pubmed_data() calls) and then saves data in a series of xml or txt files on the local drive. The function is suitable for downloading a very large number of records.

Usage

1
2
3
batch_pubmed_download(pubmed_query_string, dest_dir = NULL, 
                      dest_file_prefix = "easyPubMed_data_", format = "xml", 
                      batch_size = 400, res_cn = 1)

Arguments

pubmed_query_string

String (character-vector of length 1): this is the string used for querying PubMed (the standard PubMed Query synthax applies).

dest_dir

String (character-vector of length 1): this string corresponds to the name of the existing folder where files will be saved. Existing files will be overwritten. If NULL, the current working directory will be used.

dest_file_prefix

String (character-vector of length 1): this string is used as prefix for the files that are written locally.

format

String (character-vector of length 1): data will be requested from Entrez in this format. Acceptable values are: c("medline","uilist","abstract","asn.1", "xml") When format != "xml", data will be saved as text files (txt).

batch_size

Integer (1 < batch_size < 5000): maximum number of records to be saved in a single xml or txt file.

res_cn

Integer (> 0): numeric index of the data batch to start downloading from. This parameter is useful to resume an incomplete download job after a system crash.

Details

Download large number of PubMed records as a set of xml or txt files that are saved in the folder specified by the user. This function enforces data integrity. If a batch of downloaded data is corrupted, it is discarded and downloaded again. Each download cycle is monitored until the download job is successfully completed. This function should allow to download a whole copy of PubMed, if desired. The function informs the user about the current progress by constantly printing to console the number of batches still in queue for download. pubmed_query_string accepts standard PubMed synthax. The function will query PubMed multiple times using the same query string. Therefore, it is recommended to use a [EDAT] or a [PDAT] filter in the query if you want to ensure reproducible results.

Author(s)

Damiano Fantini <"damiano.fantini@gmail.com">

References

http://www.biotechworld.it/bioinf/2016/01/05/querying-pubmed-via-the-easypubmed-package-in-r/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
# Example 01: retrieve data from PubMed and save as XML file
ml_query <- "Machine Learning[TI] AND 2016[PD]"
out1 <- batch_pubmed_download(pubmed_query_string = ml_query, batch_size = 180)
XML::xmlParse(out1[1])
#
# Example 02: retrieve data from PubMed and save as TXT file
ml_query <- "Machine Learning[TI] AND 2016[PD]"
out2 <- batch_pubmed_download(pubmed_query_string = ml_query, batch_size = 180, format = "medline")
readLines(out2[1])[1:30]

## End(Not run)

easyPubMed documentation built on May 30, 2017, 5:10 a.m.

Search within the easyPubMed package
Search all R packages, documentation and source code