reutils is an R package for interfacing with NCBI databases such as PubMed, Genbank, or GEO via the Entrez Programming Utilities (EUtils). It provides access to the nine basic eutils: einfo, esearch, esummary, epost, efetch, elink, egquery, espell, and ecitmatch.

Please check the relevant usage guidelines when using these services. Note that Entrez server requests are subject to frequency limits. Consider obtaining an NCBI API key if are a heavy user of E-utilities.

Important functions

With nine E-Utilities, NCBI provides a programmatical interface to the Entrez query and database system for searching and retrieving requested data

Each of these tools corresponds to an R function in the reutils package described below.


esearch: search and retrieve a list of primary UIDs or the NCBI History Server information (queryKey and webEnv). The objects returned by esearch can be passed on directly to epost, esummary, elink, or efetch.


efetch: retrieve data records from NCBI in a specified retrieval type and retrieval mode as given in this table. Data are returned as XML or text documents.


esummary: retrieve Entrez database summaries (DocSums) from a list of primary UIDs (Provided as a character vector or as an esearch object)


elink: retrieve a list of UIDs (and relevancy scores) from a target database that are related to a set of UIDs provided by the user. The objects returned by elink can be passed on directly to epost, esummary, or efetch.


einfo: provide field names, term counts, last update, and available updates for each database.


epost: upload primary UIDs to the users's Web Environment on the Entrez history server for subsequent use with esummary, elink, or efetch.


esearch: Searching the Entrez databases

Let's search PubMed for articles with Chlamydia psittaci in the title that have been published in 2020 and retrieve a list of PubMed IDs (PMIDs).

pmid <- esearch("Chlamydia psittaci[titl] and 2020[pdat]", "pubmed")

Alternatively we can collect the PMIDs on the history server.

pmid2 <- esearch("Chlamydia psittaci[titl] and 2020[pdat]", "pubmed", usehistory = TRUE)

We can also use esearch to search GenBank. Here we do a search for polymorphic membrane proteins (PMPs) in Chlamydiaceae.

cpaf <- esearch("Chlamydiaceae[orgn] and PMP[gene]", "nucleotide")

Some accessors for esearch objects


Extract a vector of GIs:


Get query key and web environment:


Extract the content of an EUtil request as XML.

content(cpaf, "xml")

Or extract parts of the XML data using the reference class method #xmlValue() and an XPath expression:


esummary: Retrieving summaries from primary IDs

esummary retrieves document summaries (docsums) from a list of primary IDs. Let's find out what the first entry for PMP is about:

esum <- esummary(cpaf[1])

We can also parse docsums into a tibble

esum <- esummary(cpaf[1:4])
content(esum, "parsed")

efetch: Downloading full records from Entrez

First we search the protein database for sequences of the chlamydial protease activity factor, CPAF

cpaf <- esearch("Chlamydia[orgn] and CPAF", "protein")

Let's fetch the FASTA record for the first protein. To do that, we have to set rettype = "fasta" and retmode = "text":

cpaff <- efetch(cpaf[1], db = "protein", rettype = "fasta", retmode = "text")

Now we can write the sequence to a fasta file by first extracting the data from the efetch object using content():

write(content(cpaff), file = "~/cpaf.fna")
cpafx <- efetch(cpaf, db = "protein", rettype = "fasta", retmode = "xml")
aa <- cpafx$xmlValue("//TSeq_sequence")
defline <- cpafx$xmlValue("//TSeq_defline")

einfo: Information about the Entrez databases

You can use einfo to obtain a list of all database names accessible through the Entrez utilities:


For each of these databases, we can use einfo again to obtain more information:


