Description Usage Arguments Details Value Author(s) References See Also Examples
The msa
function provides a unified interface to
the three multiple sequence alignment algorithms in this package:
‘ClustalW’, ‘ClustalOmega’, and ‘MUSCLE’.
1 2 3 4 5 6 |
inputSeqs |
input sequences; this argument can be a character vector,
an object of class |
method |
specifies the multiple sequence alignment to be used;
currently, |
cluster |
parameter related to sequence clustering; its
interpretation and default value depends on the method;
see |
gapOpening |
gap opening penalty; the defaults are
specific to the algorithm (see |
gapExtension |
gap extension penalty; the defaults are
specific to the algorithm (see |
maxiters |
maximum number of iterations; its
interpretation and default value depends on the method;
see |
substitutionMatrix |
substitution matrix for scoring matches and
mismatches; format and defaults depend on the algorithm;
see |
type |
type of the input sequences |
order |
how the sequences should be ordered in the output object;
if |
verbose |
if |
help |
if |
... |
all other parameters are passed on to the multiple
sequence algorithm, i.e. to one of the functions
|
msa
is a simple wrapper function that unifies the interfaces of
the three functions msaClustalW
,
msaClustalOmega
, and msaMuscle
. Which
function is called, is controlled by the method
argument.
Note that the input sequences may be reordered by the multiple
sequence alignment algorithms in order to group together similar
sequences (see also description of argument order
above).
So, if the input order should be preserved or if the input order
should be recovered later, we strongly recommend to always assign
unique names to the input sequences. As noted in the description
of the inputSeqs
argument above, all functions, msa()
,
msaClustalW
, msaClustalOmega
, and
msaMuscle
, also allow
for direct reading from FASTA files. This is mainly for the reason of
memory efficiency if the sequence data set is very large. Otherwise,
we want to encourage users to first read the sequences into the R
workspace. If sequences are read from a FASTA file
directly, the order of output sequences is completely under
the control of the respective
algorithm and does not allow for checking whether the sequences are
named uniquely in the FASTA file. The preservation of the input order
works also for sequence data read from a FASTA file, but only for
ClustalW and ClustalOmega; MUSCLE does not support this (see also
argument order
above and msaMuscle
).
Depending on the type of sequences for which it was called,
msa
returns a MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
, or
MsaRNAMultipleAlignment
object.
If called with help=TRUE
, msa
returns
an invisible NULL
.
Enrico Bonatesta and Christoph Horejs-Kainrath <msa@bioinf.jku.at>
http://www.bioinf.jku.at/software/msa
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
http://www.clustal.org/download/clustalw_help.txt
http://www.clustal.org/omega/README
http://www.drive5.com/muscle/muscle.html
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-4680. DOI: 10.1093/nar/22.22.4673.
Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soeding, J., Thompson, J. D., and Higgins, D. G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539. DOI: 10.1038/msb.2011.75.
Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. DOI: 10.1093/nar/gkh340.
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. DOI: 10.1186/1471-2105-5-113.
msaClustalW
,
msaClustalOmega
, msaMuscle
,
msaPrettyPrint
, MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
,
MsaMetaData
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ## read sequences
filepath <- system.file("examples", "exampleAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)
## call unified interface msa() for default method (ClustalW) and
## default parameters
msa(mySeqs)
## call ClustalOmega through unified interface
msa(mySeqs, method="ClustalOmega")
## call MUSCLE through unified interface with some custom parameters
msa(mySeqs, method="Muscle", gapOpening=12, gapExtension=3, maxiters=16,
cluster="upgmamax", SUEFF=0.4, brenner=FALSE,
order="input", verbose=FALSE)
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
use default substitution matrix
CLUSTAL 2.1
Call:
msa(mySeqs)
MsaAAMultipleAlignment with 9 rows and 456 columns
aln names
[1] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[2] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[3] MSTAVLENPGLGRKLSDFGQETSYIE...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[4] MSALVLESRALGRKLSDFGQETSYIE...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[5] --------------------------...LNAGDRQGWADTEDV---------- PH4H_Chromobacter...
[6] --------------------------...LNAGTREGWADTADI---------- PH4H_Ralstonia_so...
[7] --------------------------...LTRGT-QAYATAGGRLAGAAAG--- PH4H_Caulobacter_...
[8] --------------------------...------------------------- PH4H_Pseudomonas_...
[9] --------------------------...------------------------- PH4H_Rhizobium_loti
Con --------------------------...??????????????IL??A???--- Consensus
using Gonnet
ClustalOmega 1.2.0
Call:
msa(mySeqs, method = "ClustalOmega")
MsaAAMultipleAlignment with 9 rows and 467 columns
aln names
[1] MSALVLESRALGRKLSDFGQETSYIE...LKI-LADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[2] MSTAVLENPGLGRKLSDFGQETSYIE...LKI-LADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[3] MAAVVLENGVLSRKLSDFGQETSYIE...LKI-LADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[4] MAAVVLENGVLSRKLSDFGQETSYIE...LKI-LADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[5] --------------------------...------------------------- PH4H_Pseudomonas_...
[6] --------------------------...------------------------- PH4H_Rhizobium_loti
[7] --------------------------...YATAGGRLAGAAAG----------- PH4H_Caulobacter_...
[8] --------------------------...GWADTEDV----------------- PH4H_Chromobacter...
[9] --------------------------...GWADTADI----------------- PH4H_Ralstonia_so...
Con --------------------------...???-?AD???????----------- Consensus
*** WARNING *** *Warning* Cannot open /proc/meminfo errno=13 Permission denied
MUSCLE 3.8.31
Call:
msa(mySeqs, method = "Muscle", gapOpening = 12, gapExtension = 3, maxiters = 16, cluster = "upgmamax", SUEFF = 0.4, brenner = FALSE, order = "input", verbose = FALSE)
MsaAAMultipleAlignment with 9 rows and 456 columns
aln names
[1] MSTAVLENPGLGRKLSDFGQETSYIE...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[2] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[3] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[4] --------------------------...NAGDRQGWADTEDV----------- PH4H_Chromobacter...
[5] --------------------------...------------------------- PH4H_Pseudomonas_...
[6] MSALVLESRALGRKLSDFGQETSYIE...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[7] --------------------------...NAGTREGWADTADI----------- PH4H_Ralstonia_so...
[8] --------------------------...TRGTQAYATAGGRLAGAAAG----- PH4H_Caulobacter_...
[9] --------------------------...------------------------- PH4H_Rhizobium_loti
Con --------------------------...?????A?????E??????A?----- Consensus
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.