gblocks | R Documentation |
Provides a wrapper to Gblocks, a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. Gblocks selects conserved blocks from a multiple alignment according to a set of features of the alignment positions.
gblocks(
x,
b1 = 0.5,
b2 = b1,
b3 = ncol(x),
b4 = 2,
b5 = "a",
target = "alignment",
exec
)
x |
A matrix of DNA sequences of classes |
b1 |
A real number, the minimum number of sequences for a conserved position given as a fraction. Values between 0.5 and 1.0 are allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5 |
b2 |
A real number, the minimum number of sequences for a flank
position given as a fraction. Values must be equal or larger than
|
b3 |
An integer, the maximum number of contiguous nonconserved positions; any integer is allowed. Larger values will increase the number of selected position, i.e. are less conservative. Defaults to the number of positions in the alignment. |
b4 |
An integer, the minimum length of a block, any integer equal to or bigger than 2 is allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 2. |
b5 |
A character string indicating the treatment of gap
positions. Three choices are possible. 1. |
target |
A vector of mode |
exec |
A character string indicating the path to the GBLOCKS executable. |
Explanation of the routine taken from the Online Documentation: First, the degree of conservation of every positions of the multiple alignment is evaluated and classified as nonconserved, conserved, or highly conserved. All stretches of contiguous nonconserved positions bigger than a certain value (b3) are rejected. In such stretches, alignments are normally ambiguous and, even when in some cases a unique alignment could be given, multiple hidden substitutions make them inadequate for phylogenetic analysis. In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks. This way, selected blocks are anchored by positions that can be aligned with high confidence. Then, all gap positions -that can be defined in three different ways (b5)- are removed. Furthermore, nonconserved positions adjacent to a gap position are also eliminated until a conserved position is reached, because regions adjacent to a gap are the most difficult to align. Finally, small blocks (falling below the limit of b4) remaining after gap cleaning are also removed.
A matrix
of class "DNAbin"
gblocks
was last updated and tested to work with Gblocks 0.91b.
If you have problems getting the function to work with a newer version of
Gblocks, please contact the package maintainer.
Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552.
Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577.
Gblocks website: https://www.biologiaevolutiva.org/jcastresana/Gblocks.html
mafft
and prank
for multiple sequence
alignment; aliscore
for another alignment masking algorithm.
data(ips.28S)
## Not run: gblocks(ips.28S)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.