create_db: create_db

View source: R/create_db.R

create_dbR Documentation

create_db

Description

generates an sqlite database from 16S rRNA gene counts produced by 16S rRNA gene sequence processing pipelines. Depending on the pipeline used, count and taxonomy tables may have to be formatted according to create_db function specifications. This function will configure the sqlite database such that is is compatible as an input into the OCMSlooksy RShiny app.

Usage

create_db(
  counts,
  taxonomy,
  overwrite = FALSE,
  outdir,
  db_name = "OCMSlooksy.db",
  fromfile = TRUE
)

Arguments

counts

file or dataframe of counts with samples in columns, features in rows. First column should contain feature identities under the heading of featureID. Feature IDs must be unique.

taxonomy

file or dataframe of taxonomy classifications corresponding to the counts table. Must contain the columns featureID, sequence, Kingdom, Phylum, Class, Order, Family, Genus, Species, Taxon. Taxon contains taxonomic classifications at all taxonomic levels, seperated by a semicolon. Unclassified taxa should be represented with NA.

overwrite

logic. default FALSE. Allow database tables to overwrite existing tables of the same name.

outdir

output directory.

db_name

string. Name of database file. Default 'OCMSlooksy.db'

fromfile

default TRUE. read count table and taxonomy tables from file (tab delimited or comma delimited)

Details

This function requires two tables: a count table which contains feature (ASVs or OTUs) for each sample, and a taxonomy table which contains the taxonomy classification associated with features found in the count table. 16S rRNA gene sequence processing pipelines vary in the format of their outputs, but most provide a tab-delimitted or comma-delimited spreadsheet containing this information. You will need to re-format (or create a new file) such that the count and taxonomy tables conform to the following requirements:

count table:

  • samples in columns and features (ASVs or OTUs) in rows

  • first column should contain feature identities under the heading featureID (column name is case sensitive)

  • feature IDs must be unique

taxonomy table:

  • contains taxonomy classifications that correspond to features in count table

  • first column contains feature IDs under the heading featureID

  • Must contain the columns featureID, sequence, Kingdom, Phylum, Class, Order, Family, Genus, Species, Taxon. Column names are case-sensitive.

  • if sequence for a feature is not known, you can leave the column empty

  • if species for a feature is not known or us not classified, set species to NA. The same rule applies other taxonomy levels.

  • Taxon contains classifications at all taxonomic levels, separated by a semicolon. Taxonomy names can be pre-pended with the initial of the taxonomy level but it is not necessary. (e.g. k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Faecalibacterium;s__NA")

The count and taxonomy tables can be supplied as .tsv or .csv files by setting fromfile = TRUE (default). Alternatively, if count and taxonomy tables are already in R, set fromfile = FALSE and supply the dataframes to the counts and taxonomy arguments, respectively.

Value

sqlite database with merged_abundance_id (counts) and merged_taxonomy (taxonomy) in a sqlite database file named as specified by db_name


schyen/OCMSExplorer documentation built on Feb. 15, 2023, 4:39 p.m.