Description Details Main functions Package options Author(s) Examples
An interface to NCBI databases such as PubMed, GenBank, or GEO powered by the Entrez Programming Utilities (EUtils). The nine EUtils provide programmatic access to the NCBI Entrez query and database system for searching and retrieving biological data.
With nine Entrez Progamming Utilities, NCBI provides a programmatical interface to the Entrez query and database system for searching and retrieving requested data
Each of these tools corresponds to an R
function in the reutils
package described below.
The output returned by the EUtils is typically in XML format. To gain access to this output you have several options:
Use the content(as = "xml")
method to extract the output as an
XMLInternalDocument
object and process it further using the
facilities provided by the XML
package.
Use the content(as = "parsed")
method to extract the output
into data.frame
s. Note that this is currently only implemented for
docsums returned by esummary
, uilists returned
by esearch
, and the output returned by einfo
.
Access specific nodes in the XML tree using XPath expressions with the
reference class methods #xmlValue
, #xmlAttr
, or #xmlName
built into eutil
objects.
The Entrez Programming Utilities can also generate output in other formats,
such as plain-text Fasta or GenBank files for sequence databases,
or the MedLine format for the literature database. The type of output is
generally controlled by setting the retmode
and rettype
arguments
when calling a EUtil.
Please check the relevant
usage guidelines
when using these services. Note that Entrez server requests are subject to
frequency limits.
esearch
: Search and retrieve primary UIDs for use
with esummary
, elink
, or efetch
.
esearch
additionally returns term translations and optionally
stores results for future use in the user's Web Environment.
esummary
: Retrieve document summaries from
a list of primary UIDs (Provided as a character vector or as an
esearch
object).
egquery
: Provides Entrez database counts in XML
for a single search term using a Global Query.
einfo
: Retrieve field names, term counts, last
update, and available updates for each database.
efetch
: Retrieve data records in a specified
format corresponding to a list of primary UIDs or from the user's Web
Environment in the Entrez History server.
elink
: Returns a list of UIDs (and relevancy
scores) from a target database that are related to a list of UIDs in
the same database or in another Entrez database.
epost
: Uploads primary UIDs to the users's Web
Environment on the Entrez history server for subsequent use with
esummary
, elink
, or efetch
.
espell
: Provide spelling suggestions.
ecitmatch
: Retrieves PubMed IDs (PMIDs) that
correspond to a set of input citation strings
content
: Extract the content of a request from the
eutil
object returned by any of the above functions.
reutils uses three options
to configure behaviour:
reutils.email
: NCBI requires that a user of their API provides an
email address with a call to Entrez. If you are going to perform a lot
of queries consider setting reutils.email
to your email address in
your .Rprofile file.
reutils.show.headlines
: By default efetch
objects containing text data show only the first 12 lines. This is quite handy
if you have downloaded a fairly large genome in Genbank file format. This
can be changed by setting the global option reutils.show.headlines
to
another numeric value or NULL
.
reutils.verbose.queries
: If you perform many queries interactively
you might want to get messages announcing the queries you run. You can do so by setting
the option reutils.verbose.queries
to TRUE
.
reutils.test.remote
: Unit tests that require online access to NCBI
services are disabled by default, as they cannot be garanteed to be
available/working under all circumstances. Set the option
codereutils.test.remote to TRUE
to run the full suite of tests.
Gerhard Schöfl gerhard.schofl@gmail.com
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #
# combine esearch and efetch
#
# Download PubMed records that are indexed in MeSH for both 'Chlamydia' and
# 'genome' and were published in 2013.
query <- "Chlamydia[mesh] and genome[mesh] and 2013[pdat]"
# Upload the PMIDs for this search to the History server
pmids <- esearch(query, "pubmed", usehistory = TRUE)
pmids
## Not run:
# Fetch the records
articles <- efetch(pmids)
# Use XPath expressions with the #xmlValue() or #xmlAttr() methods to directly
# extract specific data from the XML records stored in the 'efetch' object.
titles <- articles$xmlValue("//ArticleTitle")
abstracts <- articles$xmlValue("//AbstractText")
#
# combine epost with esummary/efetch
#
# Download protein records corresponding to a list of GI numbers.
uid <- c("194680922", "50978626", "28558982", "9507199", "6678417")
# post the GI numbers to the Entrez history server
p <- epost(uid, "protein")
# retrieve docsums with esummary
docsum <- content(esummary(p, version = "1.0"), "parsed")
docsum
# download FASTAs as 'text' with efetch
prot <- efetch(p, retmode = "text", rettype = "fasta")
prot
# retrieve the content from the efetch object
fasta <- content(prot)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.