msaMuscle: Multiple Sequence Alignment with MUSCLE

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/msaMuscle.R

Description

This function calls the multiple sequence alignment algorithm MUSCLE.

Usage

1
2
3
4
5
    msaMuscle(inputSeqs, cluster="default", gapOpening="default",
              gapExtension="default", maxiters="default",
              substitutionMatrix="default",
              type="default", order=c("aligned", "input"),
              verbose=FALSE, help=FALSE, ...)

Arguments

inputSeqs

input sequences; see msa. In the original MUSCLE implementation, this parameter is called -in.

cluster

The clustering method which should be used. Possible values are "upgma", "upgmamax", "upgmamin", "upgmb", and "neighborjoining".

gapOpening

gap opening penalty; the default is 400 for DNA sequences and 420 for RNA sequences. The default for amino acid sequences depends on the profile score settings: for the setting le=TRUE, the default is 2.9, for sp=TRUE, the default is 1,439, and for sv=TRUE, the default is 300. Note that these defaults may not be suitable if custom substitution matrices are being used. In such a case, a sensible choice of gap penalties that fits well to the substitution matrix must be made.

gapExtension

gap extension penalty; the default is 0.

maxiters

maximum number of iterations; the default is 16. In the original MUSCLE implementation, it is also possible to set maxiters to 0 which leads to an (out of memory) error. Therefore, maxiters=0 is not allowed in msaMuscle.

substitutionMatrix

substitution matrix for scoring matches and mismatches; can be a real matrix or a file name If the file interface is used, matrices have to be in NCBI-format. The original MUSCLE implementation also accepts matrices in WU_BLAST (AB_BLAST) format, but, due to copyright restrictions, this format is not supported by msaMuscle.

type

type of the input sequences inputSeqs; see msa.

order

how the sequences should be ordered in the output object (see msa for more details); the original MUSCLE implementation does not allow for preserving the order of input sequences. The msaMuscle function realizes this functionality by reverse matching of sequence names. Therefore, the sequences need to have unique names. If the sequences do not have names or if the names are not unique, the msaMuscle function assignes generic unique names "Seq1"-Seqn to the sequences and issues a warning. The choice "input" is not available at all for sequence data that is read directly from a FASTA file.

verbose

if TRUE, the algorithm displays detailed information and progress messages.

help

if TRUE, information about algorithm-specific parameters is displayed. In this case, no multiple sequence alignment is performed and the function quits after displaying the additional help information.

...

further parameters specific to MUSCLE; An overview of parameters that are available in this interface is shown when calling msaMuscle with help=TRUE. For more details, see also the documentation of MUSCLE.

Details

This is a function providing the MUSCLE multiple alignment algorithm as an R function. It can be used for various types of sequence data (see inputSeqs argument above). Parameters that are common to all multiple sequences alignments provided by the msa package are explicitly provided by the function and named in the same for all algorithms. Most other parameters that are specific to MUSCLE can be passed to MUSCLE via additional arguments (see argument help above).

For a note on the order of output sequences and direct reading from FASTA files, see msa.

Value

Depending on the type of sequences for which it was called, msaMuscle returns a MsaAAMultipleAlignment, MsaDNAMultipleAlignment, or MsaRNAMultipleAlignment object. If called with help=TRUE, msaMuscle returns an invisible NULL.

Author(s)

Enrico Bonatesta and Christoph Horejs-Kainrath <msa@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/msa

U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.

http://www.drive5.com/muscle/muscle.html

Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. DOI: 10.1093/nar/gkh340.

Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. DOI: 10.1186/1471-2105-5-113.

See Also

msa, MsaAAMultipleAlignment, MsaDNAMultipleAlignment, MsaRNAMultipleAlignment, MsaMetaData

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## read sequences
filepath <- system.file("examples", "exampleAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)

## call msaMuscle with default values
msaMuscle(mySeqs)

## call msaMuscle with custom parameters
msaMuscle(mySeqs, gapOpening=12, gapExtension=3, maxiters=16,
          cluster="upgmamax", SUEFF=0.4, brenner=FALSE,
          order="input", verbose=FALSE)

## call msaMuscle with a custom substitution matrix
data(PAM120)
msaMuscle(mySeqs, substitutionMatrix=PAM120)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit


*** WARNING *** *Warning* Cannot open /proc/meminfo errno=13 Permission denied
MUSCLE 3.8.31   

Call:
   msaMuscle(mySeqs)

MsaAAMultipleAlignment with 9 rows and 460 columns
    aln                                                    names
[1] MAAVVLENGVLSRKLSDF-GQETSYI...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[2] MAAVVLENGVLSRKLSDF-GQETSYI...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[3] MSTAVLENPGLGRKLSDF-GQETSYI...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[4] MSALVLESRALGRKLSDF-GQETSYI...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[5] -------------------MKTTQYV...------------------------- PH4H_Pseudomonas_...
[6] -------------------MSVAEYA...------------------------- PH4H_Rhizobium_loti
[7] --------------------MSGDGL...VLTRGTQAYATAGGRLAGAAAG--- PH4H_Caulobacter_...
[8] -----------MNDRADF---VVPDI...VLNAGDRQGWADTEDV--------- PH4H_Chromobacter...
[9] MAIATPTSAAPTPAPAGFTGTLTDKL...VLNAGTREGWADTADI--------- PH4H_Ralstonia_so...
Con M???????????????DF-G??T?YI...?L?????????????L??A???--- Consensus 
MUSCLE 3.8.31   

Call:
   msaMuscle(mySeqs, gapOpening = 12, gapExtension = 3, maxiters = 16,     cluster = "upgmamax", SUEFF = 0.4, brenner = FALSE, order = "input",     verbose = FALSE)

MsaAAMultipleAlignment with 9 rows and 456 columns
    aln                                                    names
[1] MSTAVLENPGLGRKLSDFGQETSYIE...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[2] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[3] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[4] --------------------------...NAGDRQGWADTEDV----------- PH4H_Chromobacter...
[5] --------------------------...------------------------- PH4H_Pseudomonas_...
[6] MSALVLESRALGRKLSDFGQETSYIE...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[7] --------------------------...NAGTREGWADTADI----------- PH4H_Ralstonia_so...
[8] --------------------------...TRGTQAYATAGGRLAGAAAG----- PH4H_Caulobacter_...
[9] --------------------------...------------------------- PH4H_Rhizobium_loti
Con --------------------------...?????A?????E??????A?----- Consensus 
MUSCLE 3.8.31   

Call:
   msaMuscle(mySeqs, substitutionMatrix = PAM120)

MsaAAMultipleAlignment with 9 rows and 453 columns
    aln                                                    names
[1] --------------------------...DIMALVHEAMRLGLHAPLFPPKQAA PH4H_Pseudomonas_...
[2] --------------------------...GDAVLTRGTQAYATAGGRLAGAAAG PH4H_Caulobacter_...
[3] MNDRADFVVPDITTRKNVGLSHDAND...------------------------- PH4H_Chromobacter...
[4] MAIATPTSAAPTPAPAGFTGTLTDKL...------------------------- PH4H_Ralstonia_so...
[5] -MSALVLESRALGRKLSDFGQETSYI...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[6] MSTAVLENPGLGRKLSDFGQETSYIE...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[7] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[8] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[9] MSVAEYARDCAAQGLRGDYSVCRADF...------------------------- PH4H_Rhizobium_loti
Con M?????????????????????????...????L????????????????K??- Consensus 

msa documentation built on Nov. 8, 2020, 5:41 p.m.