reutils-package: Talk to the NCBI EUtils

Description Details Main functions Package options Author(s) Examples

Description

An interface to NCBI databases such as PubMed, GenBank, or GEO powered by the Entrez Programming Utilities (EUtils). The nine EUtils provide programmatic access to the NCBI Entrez query and database system for searching and retrieving biological data.

Details

With nine Entrez Progamming Utilities, NCBI provides a programmatical interface to the Entrez query and database system for searching and retrieving requested data

Each of these tools corresponds to an R function in the reutils package described below.

The output returned by the EUtils is typically in XML format. To gain access to this output you have several options:

  1. Use the content(as = "xml") method to extract the output as an XMLInternalDocument object and process it further using the facilities provided by the XML package.

  2. Use the content(as = "parsed") method to extract the output into data.frames. Note that this is currently only implemented for docsums returned by esummary, uilists returned by esearch, and the output returned by einfo.

  3. Access specific nodes in the XML tree using XPath expressions with the reference class methods #xmlValue, #xmlAttr, or #xmlName built into eutil objects.

The Entrez Programming Utilities can also generate output in other formats, such as plain-text Fasta or GenBank files for sequence databases, or the MedLine format for the literature database. The type of output is generally controlled by setting the retmode and rettype arguments when calling a EUtil. Please check the relevant usage guidelines when using these services. Note that Entrez server requests are subject to frequency limits.

Main functions

Package options

reutils uses three options to configure behaviour:

Author(s)

Gerhard Schöfl gerhard.schofl@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#
# combine esearch and efetch
#
# Download PubMed records that are indexed in MeSH for both 'Chlamydia' and 
# 'genome' and were published in 2013.
query <- "Chlamydia[mesh] and genome[mesh] and 2013[pdat]"

# Upload the PMIDs for this search to the History server
pmids <- esearch(query, "pubmed", usehistory = TRUE)
pmids

## Not run: 
# Fetch the records
articles <- efetch(pmids)

# Use XPath expressions with the #xmlValue() or #xmlAttr() methods to directly
# extract specific data from the XML records stored in the 'efetch' object.
titles <- articles$xmlValue("//ArticleTitle")
abstracts <- articles$xmlValue("//AbstractText")

#
# combine epost with esummary/efetch
#
# Download protein records corresponding to a list of GI numbers.
uid <- c("194680922", "50978626", "28558982", "9507199", "6678417")

# post the GI numbers to the Entrez history server
p <- epost(uid, "protein")

# retrieve docsums with esummary
docsum <- content(esummary(p, version = "1.0"), "parsed")
docsum

# download FASTAs as 'text' with efetch
prot <- efetch(p, retmode = "text", rettype = "fasta")
prot

# retrieve the content from the efetch object
fasta <- content(prot)

## End(Not run)

reutils documentation built on May 1, 2019, 9:15 p.m.