Description Usage Arguments Details Value Author(s) References See Also Examples
Merges the two separate sequence alignments in a database. The aligned sequences must have separate identifiers in the same table or be located in different database tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
dbFile |
A SQLite connection object or a character string specifying the path to the database file. |
tblName |
Character string specifying the table(s) where the sequences are located. If two |
identifier |
Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected. If two |
type |
The type of |
add2tbl |
Character string specifying the table name in which to add the aligned sequences. |
batchSize |
Integer specifying the number of sequences to process at a time. |
perfectMatch |
Numeric giving the reward for aligning two matching nucleotides in the alignment. Only used when |
misMatch |
Numeric giving the cost for aligning two mismatched nucleotides in the alignment. Only used when |
gapOpening |
Numeric giving the cost for opening a gap in the alignment. |
gapExtension |
Numeric giving the cost for extending an open gap in the alignment. |
gapPower |
Numeric specifying the exponent to use in the gap cost function. |
terminalGap |
Numeric giving the cost for allowing leading and trailing gaps ("-" or "." characters) in the alignment. Either two numbers, the first for leading gaps and the second for trailing gaps, or a single number for both. |
normPower |
Numeric giving the exponent that controls the degree of normalization applied to scores by column occupancy. If two numerics are provided, the first controls the normalization power of terminal gaps, while the second controls that of internal gaps. A |
substitutionMatrix |
Either a substitution matrix representing the substitution scores for an alignment or the name of the amino acid substitution matrix to use in alignment. The latter may be one of the following: “BLOSUM45”, “BLOSUM50”, “BLOSUM62”, “BLOSUM80”, “BLOSUM100”, “PAM30”, “PAM40”, “PAM70”, “PAM120”, “PAM250”, or “MIQS”. The default (NULL) will use the |
processors |
The number of processors to use, or |
verbose |
Logical indicating whether to display progress. |
... |
Further arguments to be passed directly to |
Sometimes it is useful to align two large sets of sequences, where each set of sequences is already aligned but the two sets are not aligned to each other. AlignDB
first builds a profile of each sequence set in increments of batchSize
so that the entire sequence set is not required to fit in memory. Next the two profiles are aligned using dynamic programming. Finally, the new alignment is applied to all the sequences as they are incrementally added to the add2tbl
.
Two identifier
s or tblName
s must be provided, indicating the two sets of sequences to align. The sequences corresponding to the first identifier
and tblName
will be aligned to those of the second identifier
or tblName
. The aligned sequences are added to add2tbl
under a new identifier formed from the concatenation of the two identifier
s or tblName
s. (See examples section below.)
Returns the number of newly aligned sequences added to the database.
Erik Wright eswright@pitt.edu
Wright, E. S. (2015). DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics, 16, 322. http://doi.org/10.1186/s12859-015-0749-z
Wright, E. S. (2020). RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA 2020, 26, 531-540.
AlignProfiles
, AlignSeqs
, AlignTranslation
, PFASUM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | gen <- system.file("extdata", "Bacteria_175seqs.gen", package="DECIPHER")
fas <- system.file("extdata", "Bacteria_175seqs.fas", package="DECIPHER")
# Align two tables and place result into a third
dbConn <- dbConnect(SQLite(), ":memory:")
Seqs2DB(gen, "GenBank", dbConn, "Seqs1", tblName="Set1")
Seqs2DB(fas, "FASTA", dbConn, "Seqs2", tblName="Set2")
AlignDB(dbConn, tblName=c("Set1", "Set2"), add2tbl="AlignedSets")
l <- IdLengths(dbConn, "AlignedSets", add2tbl=TRUE)
BrowseDB(dbConn, tblName="AlignedSets") # all sequences have the same width
dbDisconnect(dbConn)
# Align two identifiers and place the result in the same table
dbConn <- dbConnect(SQLite(), ":memory:")
Seqs2DB(gen, "GenBank", dbConn, "Seqs1")
Seqs2DB(fas, "FASTA", dbConn, "Seqs2")
AlignDB(dbConn, identifier=c("Seqs1", "Seqs2"))
l <- IdLengths(dbConn, add2tbl=TRUE)
BrowseDB(dbConn) # note the sequences with a new identifier
dbDisconnect(dbConn)
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
Loading required package: RSQLite
Reading GenBank file chunk 1
175 total sequences in table Set1.
Time difference of 0.27 secs
Reading FASTA file chunk 1
175 total sequences in table Set2.
Time difference of 0.07 secs
================================================================================
Added 350 aligned sequences to table AlignedSets with identifier 'Set1_Set2'.
Lengths counted for 350 sequences.
Added to AlignedSets: "bases", "nonbases", and "width".
Time difference of 0.03 secs
Reading GenBank file chunk 1
175 total sequences in table Seqs.
Time difference of 0.08 secs
Reading FASTA file chunk 1
Added 175 new sequences to table Seqs.
350 total sequences in table Seqs.
Time difference of 0.05 secs
================================================================================
Added 350 aligned sequences to table Seqs with identifier 'Seqs1_Seqs2'.
Lengths counted for 700 sequences.
Added to Seqs: "bases", "nonbases", and "width".
Time difference of 0.05 secs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.