create_db | R Documentation |
generates an sqlite database from 16S rRNA gene counts produced by 16S rRNA gene sequence processing pipelines. Depending on the pipeline used, count and taxonomy tables may have to be formatted according to create_db
function specifications. This function will configure the sqlite database such that is is compatible as an input into the OCMSlooksy
RShiny app.
create_db( counts, taxonomy, overwrite = FALSE, outdir, db_name = "OCMSlooksy.db", fromfile = TRUE )
counts |
file or dataframe of counts with samples in columns, features in rows.
First column should contain feature identities under the heading of
|
taxonomy |
file or dataframe of taxonomy classifications corresponding to the
counts table. Must contain the columns |
overwrite |
logic. default FALSE. Allow database tables to overwrite existing tables of the same name. |
outdir |
output directory. |
db_name |
string. Name of database file. Default 'OCMSlooksy.db' |
fromfile |
default TRUE. read count table and taxonomy tables from file (tab delimited or comma delimited) |
This function requires two tables: a count table which contains feature (ASVs or OTUs) for each sample, and a taxonomy table which contains the taxonomy classification associated with features found in the count table. 16S rRNA gene sequence processing pipelines vary in the format of their outputs, but most provide a tab-delimitted or comma-delimited spreadsheet containing this information. You will need to re-format (or create a new file) such that the count and taxonomy tables conform to the following requirements:
count table:
samples in columns and features (ASVs or OTUs) in rows
first column should contain feature identities under the heading featureID
(column name is case sensitive)
feature IDs must be unique
taxonomy table:
contains taxonomy classifications that correspond to features in count table
first column contains feature IDs under the heading featureID
Must contain the columns featureID
, sequence
,
Kingdom
, Phylum
, Class
, Order
, Family
,
Genus
, Species
, Taxon
. Column names are case-sensitive.
if sequence
for a feature is not known, you can leave the column empty
if species
for a feature is not known or us not classified, set species to NA
. The same rule applies other taxonomy levels.
Taxon
contains classifications at all taxonomic levels, separated by a semicolon. Taxonomy names can be pre-pended with the initial of the taxonomy level but it is not necessary.
(e.g. k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Faecalibacterium;s__NA")
The count and taxonomy tables can be supplied as .tsv or .csv files by setting fromfile = TRUE
(default). Alternatively, if count and taxonomy tables are already in R, set fromfile = FALSE
and supply the dataframes to the counts
and taxonomy
arguments, respectively.
sqlite database with merged_abundance_id (counts) and
merged_taxonomy (taxonomy) in a sqlite database file named as specified by db_name
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.