Seqs2DB: Add Sequences from Text File to Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description Usage Arguments Details Value Warning Author(s) References See Also Examples

Adds sequences to a database.

Seqs2DB(seqs,
        type,
        dbFile,
        identifier,
        tblName = "Seqs",
        chunkSize = 1e7,
        replaceTbl = FALSE,
        fields = c(accession = "ACCESSION", rank = "ORGANISM"),
        processors = 1,
        verbose = TRUE,
        ...)

`seqs`	A connection object or a character string specifying the file path to the file containing the sequences, an `XStringSet` object if `type` is `XStringSet`, or a `QualityScaledXStringSet` object if `type` is `QualityScaledXStringSet`. Files compressed with `gzip`, `bzip2`, `xz`, or `lzma` compression are automatically detected and decompressed during import. Full URL paths (e.g., "http://" or "ftp://") to uncompressed text files or `gzip` compressed text files can also be used.
`type`	The type of the sequences (`seqs`) being imported. This should be (an unambiguous abbreviation of) one of `"FASTA"`, `"FASTQ"`, `"GenBank"`, `"XStringSet"`, or `"QualityScaledXStringSet"`.
`dbFile`	A SQLite connection object or a character string specifying the path to the database file. If the `dbFile` does not exist then a new database is created at this location.
`identifier`	Character string specifying the `"id"` to give the imported sequences in the database.
`tblName`	Character string specifying the table in which to add the sequences.
`chunkSize`	Number of characters to read at a time.
`replaceTbl`	Logical indicating whether to overwrite the entire table in the database. If `FALSE` (the default) then the sequences are appended to any already existing in the `tblName`. If `TRUE` the entire table is dropped, removing any existing sequences before adding any new sequences.
`fields`	Named character vector providing the fields to import from a `"GenBank"` formatted file as text columns in the database (not applicable for other `"type"`s). The default is to import the `"ACCESSION"` field as a column named `"accession"` and the `"ORGANISM"` field as a column named `"rank"`. Other uppercase fields, such as `"LOCUS"` or `"VERSION"`, can be specified in similar manner. Note that the `"DEFINITION"` field is automatically imported as a column named `"description"` in the database.
`processors`	The number of processors to use, or `NULL` to automatically detect and use all available processors.
`verbose`	Logical indicating whether to display each query as it is sent to the database.
`...`	Further arguments to be passed directly to `Codec` for compressing sequence information.

Sequences are imported into the database in chunks of lines specified by chunkSize. The sequences can then be identified by searching the database for the identifier provided. Sequences are added to the database verbatim, so that no sequence information is lost when the sequences are exported from the database. The sequence (record) names are recorded into a column named “description” in the database.

The total number of sequences in the database table is returned after import.

If replaceTbl is TRUE then any sequences already in the table are overwritten, which is equivalent to dropping the entire table.

Erik Wright eswright@pitt.edu

ES Wright (2016) "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R". The R Journal, 8(1), 352-359.

BrowseDB, SearchDB, DB2Seqs

gen <- system.file("extdata", "Bacteria_175seqs.gen", package="DECIPHER")
dbConn <- dbConnect(SQLite(), ":memory:")
Seqs2DB(gen, "GenBank", dbConn, "Bacteria")
BrowseDB(dbConn)
dna <- SearchDB(dbConn, nameBy="description")
dbDisconnect(dbConn)

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

Loading required package: RSQLite

Reading GenBank file chunk 1

175 total sequences in table Seqs.
Time difference of 0.26 secs

Search Expression:
select description, _Seqs.sequence from Seqs join _Seqs on Seqs.row_names =
_Seqs.row_names where _Seqs.row_names in (select row_names from Seqs)

DNAStringSet of length: 175
Time difference of 0.02 secs

DECIPHER documentation built on Nov. 8, 2020, 8:30 p.m.

DECIPHER index

Package overview Classify Sequences Design Group-Specific FISH Probes Design Group-Specific Primers Design Microarray Probes Design Primers That Yield Group-Specific Signatures Finding Chimeric Sequences Getting Started DECIPHERing The Art of Multiple Sequence Alignment in R The Magic of Gene Finding

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

Seqs2DB: Add Sequences from Text File to Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Warning

Author(s)

References

See Also

Examples

Example output

Related to Seqs2DB in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER Tools for curating, analyzing, and manipulating biological sequences

Seqs2DB: Add Sequences from Text File to Database In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Warning

Author(s)

References

See Also

Examples

Example output

Related to Seqs2DB in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

Seqs2DB: Add Sequences from Text File to Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences