Description Usage Arguments Details Value Author(s) References Examples
Performs a PubMed Query (via the get_pubmed_ids() function), downloads the resulting data (via multiple fetch_pubmed_data() calls) and then saves data in a series of xml or txt files on the local drive. The function is suitable for downloading a very large number of records.
1 2 3 4 5 | batch_pubmed_download(pubmed_query_string, dest_dir = NULL,
dest_file_prefix = "easyPubMed_data_",
format = "xml", api_key = NULL,
batch_size = 400, res_cn = 1,
encoding = "UTF8")
|
pubmed_query_string |
String (character-vector of length 1): this is the string used for querying PubMed (the standard PubMed Query synthax applies). |
dest_dir |
String (character-vector of length 1): this string corresponds to the name of the existing folder where files will be saved. Existing files will be overwritten. If NULL, the current working directory will be used. |
dest_file_prefix |
String (character-vector of length 1): this string is used as prefix for the files that are written locally. |
format |
String (character-vector of length 1): data will be requested from Entrez in this format. Acceptable values are: c("medline","uilist","abstract","asn.1", "xml"). When format != "xml", data will be saved as text files (txt). |
api_key |
String (character vector of length 1): user-specific API key to increase the limit of queries per second. You can obtain your key from NCBI. |
batch_size |
Integer (1 < batch_size < 5000): maximum number of records to be saved in a single xml or txt file. |
res_cn |
Integer (> 0): numeric index of the data batch to start downloading from. This parameter is useful to resume an incomplete download job after a system crash. |
encoding |
The encoding of an input/output connection can be specified by name (for example, "ASCII", or "UTF-8", in the same way as it would be given to the function base::iconv(). See iconv() help page for how to find out more about encodings that can be used on your platform. Here, we recommend using "UTF-8". |
Download large number of PubMed records as a set of xml or txt files that are saved in the folder specified by the user. This function enforces data integrity. If a batch of downloaded data is corrupted, it is discarded and downloaded again. Each download cycle is monitored until the download job is successfully completed. This function should allow to download a whole copy of PubMed, if desired. The function informs the user about the current progress by constantly printing to console the number of batches still in queue for download. pubmed_query_string accepts standard PubMed synthax. The function will query PubMed multiple times using the same query string. Therefore, it is recommended to use a [EDAT] or a [PDAT] filter in the query if you want to ensure reproducible results.
Character vector including the names of files downloaded to the local system
Damiano Fantini damiano.fantini@gmail.com
https://www.data-pulse.com/dev_site/easypubmed/
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
## Example 01: retrieve data from PubMed and save as XML file
ml_query <- "Machine Learning[TI] AND 2016[PD]"
out1 <- batch_pubmed_download(pubmed_query_string = ml_query, batch_size = 180)
readLines(out1[1])[1:30]
##
## Example 02: retrieve data from PubMed and save as TXT file
ml_query <- "Machine Learning[TI] AND 2016[PD]"
out2 <- batch_pubmed_download(pubmed_query_string = ml_query, batch_size = 180, format = "medline")
readLines(out2[1])[1:30]
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.