taxonomizrSwitch: Switch from data.table to SQLite

taxonomizrSwitchR Documentation

Switch from data.table to SQLite

Description

In version 0.5.0, taxonomizr switched from data.table to SQLite name and node lookups. See below for more details.

Details

Version 0.5.0 marked a change for name and node lookups from using data.table to using SQLite. This was necessary to increase performance (10-100x speedup for getTaxonomy) and create a simpler interface (a single SQLite database contains all necessary data). Unfortunately, this switch requires a couple breaking changes:

  • getTaxonomy changes from getTaxonomy(ids,namesDT,nodesDT) to getTaxonomy(ids,sqlFile)

  • getId changes from getId(taxa,namesDT) to getId(taxa,sqlFile)

  • read.names is deprecated, instead use read.names.sql. For example, instead of calling names<-read.names('names.dmp') in every session, simply call read.names.sql('names.dmp','accessionTaxa.sql') once (or use the convenient prepareDatabase)).

  • read.nodes is deprecated, instead use read.names.sql. For example. instead of calling nodes<-read.names('nodes.dmp') in every session, simply call read.nodes.sql('nodes.dmp','accessionTaxa.sql') once (or use the convenient prepareDatabase).

I've tried to ease any problems with this by overloading getTaxonomy and getId to still function (with a warning) if passed a data.table names and nodes argument and providing a simpler prepareDatabase function for completing all setup steps (hopefully avoiding direct calls to read.names and read.nodes for most users).

I plan to eventually remove data.table functionality to avoid a split codebase so please switch to the new SQLite format in all new code.

See Also

getTaxonomy, read.names.sql, read.nodes.sql, prepareDatabase, getId


taxonomizr documentation built on Feb. 16, 2023, 6:25 p.m.