knitr::opts_chunk$set(echo = TRUE)

Thanks for making this package! It's a huge help. After a lot of successful use, I'm finding some very strange behavior. Basically when I try to fetch nucliotide data for one ID I'm getting many records back. Here's an example:

# get record 
raw <- rentrez::entrez_fetch(db = 'nuccore', id = '403062956',
                             rettype = 'native', retmode = 'xml',
                             parsed = TRUE)

# transform into a named vector (XML structure preserved in the names...
# there's probably a much better way to do this)
rawList <- unlist(XML::xmlToList(raw))

The ID should produce a result for the species Drosophila murphyi. But the query in fact produces many records, most of which are not for Drosophila murphyi:

# find all the times a species name is mentioned
allTax <- rawList[grep('Org-ref_taxname', names(rawList))]
names(allTax) <- NULL

# there are a lot of times
length(allTax)

# most are not for the expected species
sum(allTax == 'Drosophila murphyi')

If I ask for the FASTA file it behaves as expected and gives me one record for the right species:

rawFASTA <- rentrez::entrez_fetch(db = 'nuccore', id = '403062956',
                             rettype = 'fasta')
cat(rawFASTA)


ajrominger/abbaGapR documentation built on Dec. 18, 2021, 11:29 p.m.