create_db: create_db
In schyen/OCMSExplorer: OCMSlooksy

create_db

R Documentation

create_db

Description

generates an sqlite database from 16S rRNA gene counts produced by 16S rRNA gene sequence processing pipelines. Depending on the pipeline used, count and taxonomy tables may have to be formatted according to create_db function specifications. This function will configure the sqlite database such that is is compatible as an input into the OCMSlooksy RShiny app.

Usage

create_db(
  counts,
  taxonomy,
  overwrite = FALSE,
  outdir,
  db_name = "OCMSlooksy.db",
  fromfile = TRUE
)

Arguments

`counts`	file or dataframe of counts with samples in columns, features in rows. First column should contain feature identities under the heading of `featureID`. Feature IDs must be unique.
`taxonomy`	file or dataframe of taxonomy classifications corresponding to the counts table. Must contain the columns `featureID`, `sequence`, `Kingdom`, `Phylum`, `Class`, `Order`, `Family`, `Genus`, `Species`, `Taxon`. `Taxon` contains taxonomic classifications at all taxonomic levels, seperated by a semicolon. Unclassified taxa should be represented with `NA`.
`overwrite`	logic. default FALSE. Allow database tables to overwrite existing tables of the same name.
`outdir`	output directory.
`db_name`	string. Name of database file. Default 'OCMSlooksy.db'
`fromfile`	default TRUE. read count table and taxonomy tables from file (tab delimited or comma delimited)

Details

This function requires two tables: a count table which contains feature (ASVs or OTUs) for each sample, and a taxonomy table which contains the taxonomy classification associated with features found in the count table. 16S rRNA gene sequence processing pipelines vary in the format of their outputs, but most provide a tab-delimitted or comma-delimited spreadsheet containing this information. You will need to re-format (or create a new file) such that the count and taxonomy tables conform to the following requirements:

count table:

samples in columns and features (ASVs or OTUs) in rows
first column should contain feature identities under the heading featureID (column name is case sensitive)
feature IDs must be unique

taxonomy table:

contains taxonomy classifications that correspond to features in count table
first column contains feature IDs under the heading featureID
Must contain the columns featureID, sequence, Kingdom, Phylum, Class, Order, Family, Genus, Species, Taxon. Column names are case-sensitive.
if sequence for a feature is not known, you can leave the column empty
if species for a feature is not known or us not classified, set species to NA. The same rule applies other taxonomy levels.
Taxon contains classifications at all taxonomic levels, separated by a semicolon. Taxonomy names can be pre-pended with the initial of the taxonomy level but it is not necessary. (e.g. k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Faecalibacterium;s__NA")

The count and taxonomy tables can be supplied as .tsv or .csv files by setting fromfile = TRUE (default). Alternatively, if count and taxonomy tables are already in R, set fromfile = FALSE and supply the dataframes to the counts and taxonomy arguments, respectively.