gblocks: Masking of Sequence Alignments with GBLOCKS

View source: R/gblocks.R

gblocksR Documentation

Masking of Sequence Alignments with GBLOCKS

Description

Provides a wrapper to Gblocks, a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. Gblocks selects conserved blocks from a multiple alignment according to a set of features of the alignment positions.

Usage

gblocks(
  x,
  b1 = 0.5,
  b2 = b1,
  b3 = ncol(x),
  b4 = 2,
  b5 = "a",
  target = "alignment",
  exec
)

Arguments

x

A matrix of DNA sequences of classes DNAbin.

b1

A real number, the minimum number of sequences for a conserved position given as a fraction. Values between 0.5 and 1.0 are allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5

b2

A real number, the minimum number of sequences for a flank position given as a fraction. Values must be equal or larger than b1. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 0.5

b3

An integer, the maximum number of contiguous nonconserved positions; any integer is allowed. Larger values will increase the number of selected position, i.e. are less conservative. Defaults to the number of positions in the alignment.

b4

An integer, the minimum length of a block, any integer equal to or bigger than 2 is allowed. Larger values will decrease the number of selected positions, i.e. are more conservative. Defaults to 2.

b5

A character string indicating the treatment of gap positions. Three choices are possible. 1. "n": No gap positions are allowed in the final alignment. All positions with a single gap or more are treated as a gap position for the block selection procedure, and they and the adjacent nonconserved positions are eliminated. 2. "h": Only positions where 50% or more of the sequences have a gap are treated as a gap position. Thus, positions with a gap in less than 50% of the sequences can be selected in the final alignment if they are within an appropriate block. 3. "a": All gap positions can be selected. Positions with gaps are not treated differently from other positions (default).

target

A vector of mode "character" giving the output format: "alignment" will return the alignment with only the selected positions, "index" will return the indices of the selected position, and "score" will provide a score for every position in the original alignment (0 for excluded, 1 for included).

exec

A character string indicating the path to the GBLOCKS executable.

Details

Explanation of the routine taken from the Online Documentation: First, the degree of conservation of every positions of the multiple alignment is evaluated and classified as nonconserved, conserved, or highly conserved. All stretches of contiguous nonconserved positions bigger than a certain value (b3) are rejected. In such stretches, alignments are normally ambiguous and, even when in some cases a unique alignment could be given, multiple hidden substitutions make them inadequate for phylogenetic analysis. In the remaining blocks, flanks are examined and positions are removed until blocks are surrounded by highly conserved positions at both flanks. This way, selected blocks are anchored by positions that can be aligned with high confidence. Then, all gap positions -that can be defined in three different ways (b5)- are removed. Furthermore, nonconserved positions adjacent to a gap position are also eliminated until a conserved position is reached, because regions adjacent to a gap are the most difficult to align. Finally, small blocks (falling below the limit of b4) remaining after gap cleaning are also removed.

Value

A matrix of class "DNAbin"

Note

gblocks was last updated and tested to work with Gblocks 0.91b. If you have problems getting the function to work with a newer version of Gblocks, please contact the package maintainer.

References

Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552.

Talavera, G., and J. Castresana. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577.

Gblocks website: https://www.biologiaevolutiva.org/jcastresana/Gblocks.html

See Also

mafft and prank for multiple sequence alignment; aliscore for another alignment masking algorithm.

Examples

data(ips.28S)
## Not run: gblocks(ips.28S)  

ips documentation built on May 29, 2024, 4:39 a.m.