prata Fri Sep 14 02:42:34 2018
knitr::opts_chunk$set(echo = TRUE)
taxdumpr is a R package which brings together a series of methods to manipulate NCBI's taxonomy data. It was developed to work locally, thorugh the NCBI's taxdump downloaded files. It was developed using the S4 class system, by Felipe Prata Lima (http://lbi.usp.br/membros/) and João Carlos Setubal (http://www.iq.usp.br/setubal/).
The taxdump files can be download from the NCBI's ftp site, at ftp://ftp.ncbi.nih.gov/pub/taxonomy/. You can download these files using wget with the command:
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
And uncompress it using tar:
tar zxvf taxdump.tar.gz
This way you should obtain the following files:
ls ~/taxdump/
## citations.dmp
## delnodes.dmp
## division.dmp
## gc.prt
## gencode.dmp
## merged.dmp
## names.dmp
## nodes.dmp
## readme.txt
## taxdump.tar.gz
In this docs, we are going to suppose that you downloaded and uncompressed this in your home folder. ## Install
Install the package using devtools:
devtools::install_github("felipepratalima/taxdumpr") ## Instantiate the Taxdumpr base class
Load the package:
require(taxdumpr)
## Loading required package: taxdumpr
Packages methods are organized around the Taxdumpr object. This can be instatiated by the Taxdumpr constructor, which requires: 1. nodesDmpLocation: the path to the nodes.dmp file from taxdump downloaded files. 2. namesDmpLocation: the same to names.dmp. 3. mergedDmpLocation: the same to merged.dmp.
taxdumpr <- Taxdumpr(nodesDmpLocation = "~/taxdump/nodes.dmp", namesDmpLocation = "~/taxdump/names.dmp", mergedDmpLocation = "~/taxdump/merged.dmp")
getTaxonomyIdsByNames(taxdumpr, "Corynebacterium")
## [1] 1716
getTaxonomyIdsByNames(taxdumpr, "Corynebacterium variabile")
## [1] 1727
It works with synonyms too:
getTaxonomyIdsByNames(taxdumpr, "Caseobacter")
## [1] 1716
getTaxonomyIdsByNames(taxdumpr, "Caseobacter polymorphus")
## [1] 1727
getUpdatedIds(taxdumpr, 319938)
## [1] 288004
Note that the non-updated ids are preserverd:
getUpdatedIds(taxdumpr, 288004)
## [1] 288004
getUpdatedIds(taxdumpr, c(319938, 1716, 1727, 288004))
## [1] 288004 1716 1727 288004
We recommend using getUpdatedIds always before starting to work with Taxdumpr.
getScientificNamesByIds(taxdumpr, 1716)
## [1] "Corynebacterium"
getScientificNamesByIds(taxdumpr, 1727)
## [1] "Corynebacterium variabile"
getScientificNamesByNames(taxdumpr, "Corynebacterium")
## [1] "Corynebacterium"
getScientificNamesByNames(taxdumpr, "Corynebacterium variabile")
## [1] "Corynebacterium variabile"
Synonyms:
getScientificNamesByNames(taxdumpr, "Caseobacter")
## [1] "Corynebacterium"
getScientificNamesByNames(taxdumpr, "Caseobacter polymorphus")
## [1] "Corynebacterium variabile"
getTaxonomyRanksByIds(taxdumpr, 1716)
## [1] "genus"
getTaxonomyRanksByIds(taxdumpr, 1727)
## [1] "species"
getParentTaxonomyIdsByIds(taxdumpr, 1716)
## [1] 1653
getParentTaxonomyIdsByIds(taxdumpr, 1727)
## [1] 1716
getParentTaxonomyIdsByNames(taxdumpr, "Corynebacterium")
## [1] 1653
getParentTaxonomyIdsByNames(taxdumpr, "Corynebacterium variabile")
## [1] 1716
getParentScientificNamesByIds(taxdumpr, 1716)
## [1] "Corynebacteriaceae"
getParentScientificNamesByIds(taxdumpr, 1727)
## [1] "Corynebacterium"
getParentScientificNamesByNames(taxdumpr, "Corynebacterium")
## [1] "Corynebacteriaceae"
getParentScientificNamesByNames(taxdumpr, "Corynebacterium variabile")
## [1] "Corynebacterium"
## Non-standard
getStandardTaxonomyIdsByIds(taxdumpr, 290318)
## [1] 1094
getStandardTaxonomyIdsByIds(taxdumpr, 274493)
## [1] 191412
getStandardTaxonomyIdsByIds(taxdumpr, 1783270)
## [1] 2
## Standard
getStandardTaxonomyIdsByIds(taxdumpr, 1094)
## [1] 1094
## Non-standard
getStandardTaxonomyIdsByNames(taxdumpr, "Chlorobium phaeovibrioides DSM 265")
## [1] 1094
getStandardTaxonomyIdsByNames(taxdumpr, "Chlorobium/Pelodictyon group")
## [1] 191412
getStandardTaxonomyIdsByNames(taxdumpr, "FCB group")
## [1] 2
## Standard
getStandardTaxonomyIdsByNames(taxdumpr, "Chlorobium phaeovibrioides")
## [1] 1094
## Non-standard
getStandardScientificNamesByNames(taxdumpr, "Chlorobium phaeovibrioides DSM 265")
## [1] "Chlorobium phaeovibrioides"
getStandardScientificNamesByNames(taxdumpr, "Chlorobium/Pelodictyon group")
## [1] "Chlorobiaceae"
getStandardScientificNamesByNames(taxdumpr, "FCB group")
## [1] "Bacteria"
## Standard
getStandardScientificNamesByNames(taxdumpr, "Chlorobium phaeovibrioides")
## [1] "Chlorobium phaeovibrioides"
getLineageIdsByIds(taxdumpr, 290318)
## taxonomyId lineageId
## 1 290318 2
## 2 290318 1783270
## 3 290318 68336
## 4 290318 1090
## 5 290318 191410
## 6 290318 191411
## 7 290318 191412
## 8 290318 274493
## 9 290318 1091
## 10 290318 1094
## 11 290318 290318
getStandardLineageIdsByIds(taxdumpr, 290318)
## taxonomyId lineageId
## 1 290318 2
## 4 290318 1090
## 5 290318 191410
## 6 290318 191411
## 7 290318 191412
## 9 290318 1091
## 10 290318 1094
getStandardLineageIdsByIdsAsDataFrame(taxdumpr, 1094)
## taxonomyId superkingdomId phylumId classId orderId familyId genusId
## 1 1094 2 1090 191410 191411 191412 1091
## speciesId
## 1 1094
getStandardLineageIdsAndScientificNamesByIdsAsDataFrame(taxdumpr, 1094)
## taxonomyId superkingdomId phylumId classId orderId familyId genusId
## 1 1094 2 1090 191410 191411 191412 1091
## speciesId taxonomyName superkingdomName phylumName
## 1 1094 Chlorobium phaeovibrioides Bacteria Chlorobi
## className orderName familyName genusName
## 1 Chlorobia Chlorobiales Chlorobiaceae Chlorobium
## speciesName
## 1 Chlorobium phaeovibrioides
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.