AlignSeqs: Align a Set of Unaligned Sequences
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description Usage Arguments Details Value Author(s) References See Also Examples

Performs profile-to-profile alignment of multiple unaligned sequences following a guide tree.

AlignSeqs(myXStringSet,
         guideTree = NULL,
         iterations = 2,
         refinements = 1,
         gapOpening = c(-18, -16),
         gapExtension = c(-2, -1),
         useStructures = TRUE,
         structures = NULL,
         FUN = AdjustAlignment,
         levels = c(0.9, 0.7, 0.7, 0.4, 10, 5, 5, 2),
         alphabet = AA_REDUCED[[1]],
         processors = 1,
         verbose = TRUE,
         ...)

`myXStringSet`	An `AAStringSet`, `DNAStringSet`, or `RNAStringSet` object of unaligned sequences.
`guideTree`	Either `NULL` or a `dendrogram` giving the ordered tree structure in which to align profiles. If `NULL` then a guide tree will be automatically constructed based on the order of shared k-mers.
`iterations`	Number of iteration steps to perform. During each iteration step the guide tree is regenerated based on the alignment and the sequences are realigned.
`refinements`	Number of refinement steps to perform. During each refinement step groups of sequences are realigned to rest of the sequences, and the best of these two alignments (before and after realignment) is kept.
`gapOpening`	Single numeric giving the cost for opening a gap in the alignment, or two numbers giving the minimum and maximum costs. In the latter case the cost will be varied depending upon whether the groups of sequences being aligned are nearly identical or maximally distant.
`gapExtension`	Single numeric giving the cost for extending an open gap in the alignment, or two numbers giving the minimum and maximum costs. In the latter case the cost will be varied depending upon whether the groups of sequences being aligned are nearly identical or maximally distant.
`useStructures`	Logical indicating whether to use secondary structure predictions during alignment. If `TRUE` (the default), secondary structure probabilities will be automatically calculated for amino acid and RNA sequences if they are not provided (i.e., when `structures` is `NULL`).
`structures`	Either a list of secondary structure probabilities matching the `structureMatrix`, such as that output by `PredictHEC` or `PredictDBN`, or `NULL` to generate the structures automatically. Only applicable if `myXStringSet` is an `AAStringSet` or `RNAStringSet`.
`FUN`	A function to be applied after each profile-to-profile alignment. (See details section below.)
`levels`	Numeric with eight elements specifying the levels at which to trigger events. (See details section below.)
`alphabet`	Character vector of amino acid groupings used to reduce the 20 standard amino acids into smaller groups. Alphabet reduction helps to find more distant homologies between sequences. A non-reduced amino acid alphabet can be used by setting `alphabet` equal to `AA_STANDARD`. Only applicable if `myXStringSet` is an `AAStringSet`.
`processors`	The number of processors to use, or `NULL` to automatically detect and use all available processors.
`verbose`	Logical indicating whether to display progress.
`...`	Further arguments to be passed directly to `AlignProfiles`, including `perfectMatch`, `misMatch`, `gapPower`, `terminalGap`, `restrict`, `anchor`, `normPower`, `substitutionMatrix`, and `structureMatrix`.

The profile-to-profile method aligns a sequence set by merging profiles along a guide tree until all the input sequences are aligned. This process has three main steps: (1) If guideTree=NULL, an initial single-linkage guide tree is constructed based on a distance matrix of shared k-mers. Alternatively, a dendrogram can be provided as the initial guideTree. (2) If iterations is greater than zero, then a UPGMA guide tree is built based on the initial alignment and the sequences are re-aligned along this tree. This process repeated iterations times or until convergence. (3) If refinements is greater than zero, then subsets of the alignment are re-aligned to the remainder of the alignment. This process generates two alignments, the best of which is chosen based on its sum-of-pairs score. This refinement process is repeated refinements times, or until convergence.

The purpose of levels is to speed-up the alignment process by not running time consuming processes when they are unlikely to change the outcome. The first four levels control when refinements occur and the function FUN is run on the alignment. The default levels specify that these events should happen when above 0.9 (AA; levels[1]) or 0.7 (DNA/RNA; levels[3]) average dissimilarity on the initial tree, when above 0.7 (AA; levels[2]) or 0.4 (DNA/RNA; levels[4]) average dissimilarity on the iterative tree(s), and after every tenth improvement made during refinement. The sixth element of levels (levels[6]) prevents FUN from being applied at any point to less than 5 sequences.

The FUN function is always applied just before returning the alignment so long as there are at least levels[6] sequences. The default FUN is AdjustAlignment, but FUN can be any function that takes in an XStringSet as its first argument, as well as weights, processors, and substitutionMatrix as optional arguments. For example, the default FUN could be altered to not perform any changes by setting it equal to function(x, ...) return(x), where x is an XStringSet.

Secondary structures are automatically computed for amino acid and RNA sequences unless structures are provided or useStructures is FALSE. The default structureMatrix is used unless an alternative is provided. For RNA sequences, secondary structures are only computed when the total length of the initial guide tree is at least 5 (levels[7]) or the length of subsequent trees is at least 2 (levels[8]).

An XStringSet of aligned sequences.

Erik Wright eswright@pitt.edu

Wright, E. S. (2015). DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics, 16, 322. http://doi.org/10.1186/s12859-015-0749-z

Wright, E. S. (2020). RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA 2020, 26, 531-540.

AdjustAlignment, AlignDB, AlignProfiles, AlignSynteny, AlignTranslation, IdClusters, ReadDendrogram, StaggerAlignment

db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
dna <- SearchDB(db, remove="all")
alignedDNA <- AlignSeqs(dna)
BrowseSeqs(alignedDNA, highlight=1)

# use secondary structure with RNA sequences
alignedRNA <- AlignSeqs(RNAStringSet(dna))
BrowseSeqs(alignedRNA, highlight=1)

DECIPHER documentation built on Nov. 8, 2020, 8:30 p.m.

DECIPHER index

Package overview Classify Sequences Design Group-Specific FISH Probes Design Group-Specific Primers Design Microarray Probes Design Primers That Yield Group-Specific Signatures Finding Chimeric Sequences Getting Started DECIPHERing The Art of Multiple Sequence Alignment in R The Magic of Gene Finding

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

AlignSeqs: Align a Set of Unaligned Sequences
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to AlignSeqs in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER Tools for curating, analyzing, and manipulating biological sequences

AlignSeqs: Align a Set of Unaligned Sequences In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to AlignSeqs in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

AlignSeqs: Align a Set of Unaligned Sequences
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences