msaTrim | R Documentation |
Trimming a multiple sequence alignment by discarding columns with too many gaps.
msaTrim(msa, gap.end = 0.5, gap.mid = 0.9)
msa |
A fasta object containing a multiple alignment. |
gap.end |
Fraction of gaps tolerated at the ends of the alignment (0-1). |
gap.mid |
Fraction of gaps tolerated inside the alignment (0-1). |
A multiple alignment is trimmed by removing columns with too many indels (gap-symbols). Any
columns containing a fraction of gaps larger than gap.mid
are discarded. For this reason, gap.mid
should always be farily close to 1.0 therwise too many columns may be discarded, destroying the alignment.
Due to the heuristics of multiple alignment methods, both ends of the alignment tend to be uncertain and most
of the trimming should be done at the ends. Starting at each end, columns are discarded as long as their fraction of gaps
surpasses gap.end
. Typically gap.end
can be much smaller than gap.mid
, but if
set too low you risk that all columns are discarded!
The trimmed alignment is returned as a fasta object.
Lars Snipen.
muscle
, msalign
.
msa.file <- file.path(path.package("microseq"),"extdata", "small.msa")
msa <- readFasta(msa.file)
print(str_length(msa$Sequence))
msa.trimmed <- msaTrim(msa)
print(str_length(msa.trimmed$Sequence))
msa.mat <- msa2mat(msa) # for use with ape::as.DNAbin(msa.mat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.