Download supplementary materials from journals
Put a call to this function where you would put a file-path - everything is cached by default, so you don't have to worry about multiple downloads in the same session.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
ft_get_si(x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, ...) ## S3 method for class 'character' ft_get_si(x, si, from = c("auto", "plos", "wiley", "science", "proceedings", "figshare", "esa_data_archives", "esa_archives", "biorxiv", "epmc"), save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, ...) ## S3 method for class 'ft_data' ft_get_si(x, si, from = NA, save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, ...) ## S3 method for class 'ft' ft_get_si(x, si, from = NA, save.name = NA, dir = NA, cache = TRUE, vol = NA, issue = NA, list = FALSE, timeout = 10, ...)
One of: vector of DOI(s) of article(s) (a
number of the supplement to be downloaded (1, 2, 3, etc.),
or (for ESA and Science journals) the name of the supplment (e.g.,
"S1_data.csv"). Can be a
Publisher of article (
a name for the file to download
directory to save file to (
Article volume (Proceedings journals only;
Article issue (Proceedings journals only;
how long to wait for successful download (default 10 seconds)
Further args passed on to
The examples probably give the best indication of how to
use this function. In general, just specify the DOI of the article
you want to download data from, and the number of the supplement
you want to download (1, 5, etc.). ESA journals don't use DOIs
(give the article code; see below), and Proceedings, Science, and
ESA journals need you to give the filename of the supplement to
download. For FigShare articles, you can give either the number or
the name. The file extensions (suffixes) of files are returned as
suffix attributes (see first example), which may be useful
if you don't know the format of the file you're downloading.
For any DOIs not recognised (and if asked) the European PubMed
Central API is used to look up articles. What this database calls a
supplementary file varies by publisher; often they will simply be
figures within articles, but we (obviously) have no way to check
this at run-time. I strongly recommend you run any EPMC calls with
list=TRUE the first time, to see the filenames that EPMC
gives supplements, as these also often vary from what the authors
gave them. This may actually be a 'feature', not a 'bug', if you're
trying to automate some sort of meta-analysis.
Below is a list of all the publishers this supports, and examples of articles from them. I'm aware that there isn't perfect overlap between these publishers and the rest of the package; I plan to correct this in the near future.
Default. Use a cross-ref search (
cr_works) on the DOI to determine the publisher.
Public Library of Science journals (e.g., PLoS One; http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126524)
Wiley journals, (e.g., http://onlinelibrary.wiley.com/doi/10.1111/ele.12289/abstract
Science magazine (e.g., http://www.sciencemag.org/content/345/6200/1041.short)
Royal Society of London journals (e.g., http://rspb.royalsocietypublishing.org/content/282/1814/20151215). Requires
issueof the article.
Figshare, (e.g., http://bit.ly/figshare-example)
- esa_data_archives & esa_data
You must give article codes, not DOIs, for these, which you can find on the article itself. An ESA Data Archive paper - not to be confused with an ESA Archive, which is the supplement to an ESA paper. The distinction seems less crazy once you're reading the paper - if it only describes a dataset, it's an
esa_archivepaper, else it's an
esa_data_archive. For example, http://www.esapubs.org/archive/ecol/E092/201/default.htm is an
esa_data_archivewhose article code is E092-201-D1; http://esapubs.org/Archive/ecol/E093/059/default.htm is a
esa_archivewhose code is E093-059-D1.
Load from bioRxiv (e.g., http://biorxiv.org/content/early/2015/09/11/026575)
Look up an article on the Europe PubMed Central, and then download the file using their supplementary materials API (http://europepmc.org/restfulwebservice). See comments above in 'notes' about EPMC.
Make sure that the article from which you're attempting to download supplementary materials *has* supplementary materials. 404 errors and 'file not found' errors can result from such cases.
Will Pearse (email@example.com)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## Not run: #Put the function wherever you would put a file path crabs <- read.csv(ft_get_si("10.6084/m9.figshare.979288", 2)) #View the suffix (file extension) of downloaded files # - note that not all files are uploaded/stored with useful file extensions! ft_get_si("10.6084/m9.figshare.979288", 2) attr(ft_get_si("10.6084/m9.figshare.979288", 2), "suffix") #ESA data papers and regular articles *must* be marked fungi <- read.csv(ft_get_si("E093-059", "myco_db.csv", "esa_archives")) mammals <- read.csv(ft_get_si("E092-201", "MCDB_communities.csv", "esa_data_archives")) epmc.fig <- ft_get_si("10.1371/journal.pone.0126524", "pone.0126524.g005.jpg", "epmc") #...note this 'SI' is not actually an SI, but rather an image from the paper. # curl options ft_get_si("E093-059", "myco_db.csv", "esa_archives") ## End(Not run)