ft_get: Get full text

Description Usage Arguments Details Value Notes on specific publishers Examples

View source: R/ft_get.R

Description

ft_get is a one stop shop to fetch full text of articles, either XML or PDFs. We have specific support for PLOS via the rplos package, Entrez via the rentrez package, and arXiv via the aRxiv package. For other publishers, we have helpers to ft_get to sort out links for full text based on user input. See Details for help on how to use this function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
  entrezopts = list(), elifeopts = list(), cache = FALSE,
  backend = "rds", path = "~/.fulltext", ...)

## S3 method for class 'character'
ft_get(x, from = NULL, plosopts = list(),
  bmcopts = list(), entrezopts = list(), elifeopts = list(),
  cache = FALSE, backend = "rds", path = "~/.fulltext", ...)

## S3 method for class 'list'
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
  entrezopts = list(), elifeopts = list(), cache = FALSE,
  backend = "rds", path = "~/.fulltext", ...)

## S3 method for class 'ft'
ft_get(x, from = NULL, plosopts = list(), bmcopts = list(),
  entrezopts = list(), elifeopts = list(), cache = FALSE,
  backend = "rds", path = "~/.fulltext", ...)

Arguments

x

Either identifiers for papers, either DOIs (or other ids) as a list of charcter strings, or a character vector, OR an object of class ft, as returned from ft_search

from

Source to query. Optional.

plosopts

PLOS options. See plos_fulltext

bmcopts

BMC options. parameter DEPRECATED

entrezopts

Entrez options. See entrez_search and entrez_fetch

elifeopts

eLife options

cache

(logical) To cache results or not. If cache=TRUE, raw XML, or other format that article is in is written to disk, then pulled from disk when further manipulations are done on the data. See also cache

backend

(character) One of rds, rcache, or redis

path

(character) Path to local folder. If the folder doesn't exist, we create it for you.

...

Further args passed on to GET

Details

There are various ways to use ft_get:

Note that some publishers are available via Entrez, but often not recent articles, where "recent" may be a few months to a year or so. In that case, make sure to specify the publisher, or else you'll get back no data.

Value

An object of class ft_data (of type S3) with slots for each of the publishers. The returned object is split up by publishers because the full text format is the same within publisher - which should facilitate text mining downstream as different steps may be needed for each publisher's content.

Notes on specific publishers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
## Not run: 
# If you just have DOIs and don't know the publisher
## PLOS
ft_get('10.1371/journal.pone.0086169')
## PeerJ
ft_get('10.7717/peerj.228')
## eLife
ft_get('10.7554/eLife.03032')
## some BMC DOIs will work, but some may not, who knows
ft_get(c('10.1186/2049-2618-2-7', '10.1186/2193-1801-3-7'))
## FrontiersIn
res <- ft_get(c('10.3389/fphar.2014.00109', '10.3389/feart.2015.00009'))
## Hindawi - via Entrez
res <- ft_get(c('10.1155/2014/292109','10.1155/2014/162024','10.1155/2014/249309'))
## F1000Research - via Entrez
ft_get('10.12688/f1000research.6522.1')
## Two different publishers via Entrez - retains publisher names
res <- ft_get(c('10.1155/2014/292109', '10.12688/f1000research.6522.1'))
res$hindawi
res$f1000research
## Pensoft
ft_get('10.3897/zookeys.499.8360')
### you'll need to specify the publisher for a DOI from a recent publication
ft_get('10.3897/zookeys.515.9332', from = "pensoft")
## Copernicus
out <- ft_get(c('10.5194/angeo-31-2157-2013', '10.5194/bg-12-4577-2015'))
out$copernicus
## arXiv - only pdf, you have to pass in the from parameter
res <- ft_get(x='cond-mat/9309029', from = "arxiv", cache=TRUE, backend="rds")
res %>% ft_extract
## bioRxiv - only pdf
res <- ft_get(x='10.1101/012476')
res$biorxiv
## Karger Publisher
ft_get('10.1159/000369331')
## CogentOA Publisher
ft_get('10.1080/23311916.2014.938430')
## MDPI Publisher
ft_get('10.3390/nu3010063')
ft_get('10.3390/nu7085279')
ft_get(c('10.3390/nu3010063', '10.3390/nu7085279')) # not working, only getting 1

# If you know the publisher, give DOI and publisher
## by default, PLOS gives back XML
ft_get('10.1371/journal.pone.0086169', from='plos')
## you can instead get json
ft_get('10.1371/journal.pone.0086169', from='plos', plosopts=list(wt="json"))

(dois <- searchplos(q="*:*", fl='id',
   fq=list('doc_type:full',"article_type:\"research article\""), limit=5)$data$id)
ft_get(dois, from='plos')
ft_get(c('10.7717/peerj.228','10.7717/peerj.234'), from='entrez')

# elife
ft_get('10.7554/eLife.04300', from='elife')
ft_get(c('10.7554/eLife.04300', '10.7554/eLife.03032'), from='elife')
## search for elife papers via Entrez
dois <- ft_search("elife[journal]", from = "entrez")
ft_get(dois)

# Frontiers in Pharmacology (publisher: Frontiers)
doi <- '10.3389/fphar.2014.00109'
ft_get(doi, from="entrez")

# Hindawi Journals
ft_get(c('10.1155/2014/292109','10.1155/2014/162024','10.1155/2014/249309'), from='entrez')
res <- ft_search(query='ecology', from='crossref', limit=50,
                 crossrefopts = list(filter=list(has_full_text = TRUE,
                                                 member=98,
                                                 type='journal-article')))

out <- ft_get(res$crossref$data$DOI[1:20], from='entrez')

# Frontiers Publisher - Frontiers in Aging Nueroscience
res <- ft_get("10.3389/fnagi.2014.00130", from='entrez')
res$entrez

# Search entrez, get some DOIs
(res <- ft_search(query='ecology', from='entrez'))
res$entrez$data$doi
ft_get(res$entrez$data$doi[1], from='entrez')
ft_get(res$entrez$data$doi[1:3], from='entrez')

# Caching
res <- ft_get('10.1371/journal.pone.0086169', from='plos', cache=TRUE, backend="rds")

# Search entrez, and pass to ft_get()
(res <- ft_search(query='ecology', from='entrez'))
ft_get(res)

## End(Not run)

fulltext documentation built on May 19, 2017, 9:59 a.m.

Search within the fulltext package
Search all R packages, documentation and source code