DB2Seqs: Export Database Sequences to a FASTA or FASTQ File
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description Usage Arguments Details Value Author(s) References Examples

Exports a database containing sequences to a FASTA or FASTQ formatted file of sequence records.

DB2Seqs(file,
         dbFile,
         tblName = "Seqs",
         identifier = "",
         type = "BStringSet",
         limit = -1,
         replaceChar = NA,
         nameBy = "description",
         orderBy = "row_names",
         removeGaps = "none",
         append = FALSE,
         width = 80,
         compress = FALSE,
         chunkSize = 1e5,
         sep = "::",
         clause = "",
         verbose = TRUE)

`file`	Character string giving the location where the file should be written.
`dbFile`	A SQLite connection object or a character string specifying the path to the database file.
`tblName`	Character string specifying the table in which to extract the data.
`identifier`	Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.
`type`	The type of `XStringSet` (sequences) to export to a FASTA formatted file or `QualityScaledXStringSet` to export to a FASTQ formatted file. This should be (an unambiguous abbreviation of) one of `"DNAStringSet"`, `"RNAStringSet"`, `"AAStringSet"`, `"BStringSet"`, `"QualityScaledDNAStringSet"`, `"QualityScaledRNAStringSet"`, `"QualityScaledAAStringSet"`, or `"QualityScaledBStringSet"`. (See details section below.)
`limit`	Number of results to display. The default (`-1`) does not limit the number of results.
`replaceChar`	Optional character used to replace any characters of the sequence that are not present in the `XStringSet`'s alphabet. Not applicable if `type=="BStringSet"`. The default (`NA`) results in an error if an incompatible character exist. (See details section below.)
`nameBy`	Character string giving the column name(s) for identifying each sequence record. If more than one column name is provided, the information in each column is concatenated, separated by `sep`, in the order specified.
`orderBy`	Character string giving the column name for sorting the results. Defaults to the order of entries in the database. Optionally can be followed by `" ASC"` or `" DESC"` to specify ascending (the default) or descending order.
`removeGaps`	Determines how gaps ("-" or "." characters) are removed in the sequences. This should be (an unambiguous abbreviation of) one of `"none"`, `"all"` or `"common"`.
`append`	Logical indicating whether to append the output to the existing `file`.
`width`	Integer specifying the maximum number of characters per line of sequence. Not applicable when exporting to a FASTQ formatted file.
`compress`	Logical specifying whether to compress the output file using gzip compression.
`chunkSize`	Number of sequences to write to the `file` at a time. Cannot be less than the total number of sequences if `removeGaps` is `"common"`.
`sep`	Character string providing the separator between fields in each sequence's name, by default pairs of colons (“::”).
`clause`	An optional character string to append to the query as part of a “where clause”.
`verbose`	Logical indicating whether to display status.

Sequences are exported into either a FASTA or FASTQ file as determined by the type of sequences. If type is an XStringSet then sequences are exported to FASTA format. Quality information for QualityScaledXStringSets are interpreted as PredQuality scores before export to FASTQ format.

If type is "BStringSet" (the default) then sequences are exported to a FASTA file exactly the same as they were when imported. If type is "DNAStringSet" then all U's are converted to T's before export, and vise-versa if type is "RNAStringSet". All remaining characters not in the XStringSet's alphabet are converted to replaceChar or removed if replaceChar is "". Note that if replaceChar is NA (the default), it will result in an error when an unexpected character is found.

Writes a FASTA or FASTQ formatted file containing the sequence records in the database.

Returns the number of sequence records written to the file.

Erik Wright eswright@pitt.edu

ES Wright (2016) "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R". The R Journal, 8(1), 352-359.

db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
tf <- tempfile()
DB2Seqs(tf, db, limit=10)
file.show(tf) # press 'q' to exit
unlink(tf)