msaClustalW: Multiple Sequence Alignment with ClustalW

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/msaClustalW.R

Description

This function calls the multiple sequence alignment algorithm ClustalW.

Usage

1
2
3
4
5
    msaClustalW(inputSeqs, cluster="default", gapOpening="default", 
                gapExtension="default", maxiters="default", 
                substitutionMatrix="default", type="default", 
                order=c("aligned", "input"), verbose=FALSE,
                help=FALSE, ...)

Arguments

inputSeqs

input sequences; see msa. In the original ClustalW implementation, this parameter is called infile.

cluster

The clustering method which should be used. Possible values are "nj" (default) and "upgma". In the original ClustalW implementation, this parameter is called clustering. Please note that cluster="upgma" leads to an unidentified error on Windows with R 4.0.x that even crashes the entire R session.

gapOpening

gap opening penalty; the default value for nucleotide sequences is 15.0, the default value for amino acid sequences is 10.0.

gapExtension

gap extension penalty; the default value for nucleotide sequences is 6.66, the default value for amino acid sequences is 0.2.

maxiters

maximum number of iterations; the default value is 16. In the original ClustalW implementation, this parameter is called numiters.

substitutionMatrix

substitution matrix for scoring matches and mismatches; can be a real matrix, a file name, or the name of a built-in substitution matrix. In the latter case, the choices "blosum", "pam", "gonnet", and "id" are supported for amino acid sequences. For aligning nucleotide sequences, the choices "iub" and "clustalw" are possible. The parameter dnamatrix can also be used instead for the sake of backwards compatibility. The valid choices for this parameter are "iub" and "clustalw". In the original ClustalW implementation, this parameter is called matrix.

type

type of the input sequences inputSeqs; see msa.

order

how the sequences should be ordered in the output object (see msa); in the original ClustalW implementation, this parameter is called outorder.

verbose

if TRUE, the algorithm displays detailed information and progress messages.

help

if TRUE, information about algorithm-specific parameters is displayed. In this case, no multiple sequence alignment is performed and the function quits after displaying the additional help information.

...

further parameters specific to ClustalW; An overview of parameters that are available in this interface is shown when calling msaClustalW with help=TRUE. For more details, see also the documentation of ClustalW.

Details

This is a function providing the ClustalW multiple alignment algorithm as an R function. It can be used for various types of sequence data (see inputSeqs argument above). Parameters that are common to all multiple sequences alignments provided by the msa package are explicitly provided by the function and named in the same for all algorithms. Most other parameters that are specific to ClustalW can be passed to ClustalW via additional arguments (see argument help above).

For a note on the order of output sequences and direct reading from FASTA files, see msa.

Value

Depending on the type of sequences for which it was called, msaClustalW returns a MsaAAMultipleAlignment, MsaDNAMultipleAlignment, or MsaRNAMultipleAlignment object. If called with help=TRUE, msaClustalW returns an invisible NULL.

Author(s)

Enrico Bonatesta and Christoph Horejs-Kainrath <msa@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/msa

U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.

http://www.clustal.org/download/clustalw_help.txt

Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-4680. DOI: 10.1093/nar/22.22.4673.

See Also

msa, MsaAAMultipleAlignment, MsaDNAMultipleAlignment, MsaRNAMultipleAlignment, MsaMetaData

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## read sequences
filepath <- system.file("examples", "exampleAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)

## call msaClustalW with default values
msaClustalW(mySeqs)

## call msaClustalW with custom parameters
msaClustalW(mySeqs, gapOpening=1, gapExtension=1, maxiters=16,
            kimura=FALSE, order="input", maxdiv=23)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package:BiostringsThe following object is masked frompackage:base:

    strsplit

use default substitution matrix
CLUSTAL 2.1  

Call:
   msaClustalW(mySeqs)

MsaAAMultipleAlignment with 9 rows and 456 columns
    aln                                                    names
[1] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
[2] MAAVVLENGVLSRKLSDFGQETSYIE...QLKILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
[3] MSTAVLENPGLGRKLSDFGQETSYIE...QLKILADSINSEIGILCSALQKIK- PH4H_Homo_sapiens
[4] MSALVLESRALGRKLSDFGQETSYIE...QLKILADSISSEVEILCSALQKLK- PH4H_Bos_taurus
[5] --------------------------...LNAGDRQGWADTEDV---------- PH4H_Chromobacter...
[6] --------------------------...LNAGTREGWADTADI---------- PH4H_Ralstonia_so...
[7] --------------------------...LTRGT-QAYATAGGRLAGAAAG--- PH4H_Caulobacter_...
[8] --------------------------...------------------------- PH4H_Pseudomonas_...
[9] --------------------------...------------------------- PH4H_Rhizobium_loti
Con --------------------------...??????????????IL??A???--- Consensus 
use default substitution matrix
CLUSTAL 2.1  

Call:
   msaClustalW(mySeqs, gapOpening = 1, gapExtension = 1, maxiters = 16,     kimura = FALSE, order = "input", maxdiv = 23)

MsaAAMultipleAlignment with 9 rows and 466 columns
    aln                                                    names
[1] MSTAVLENPGLGRKLSDFGQETSYIE...LKILADSIN-SEIGILCSALQKIK- PH4H_Homo_sapiens
[2] MAAVVLENGVLSRKLSDFGQETSYIE...LKILADSIN-SEVGILCNALQKIKS PH4H_Rattus_norve...
[3] MAAVVLENGVLSRKLSDFGQETSYIE...LKILADSIN-SEVGILCHALQKIKS PH4H_Mus_musculus
[4] --------------------------...LVLNAGDRQ----G--WADTEDV-- PH4H_Chromobacter...
[5] --------------------------...--LFP-PKQ--------A--A---- PH4H_Pseudomonas_...
[6] MSALVLESRALGRKLSDFGQETSYIE...LKILADSIS-SEVEILCSALQKLK- PH4H_Bos_taurus
[7] --------------------------...AVLNAGTRE----G--WADTADI-- PH4H_Ralstonia_so...
[8] --------------------------...AVLTRGTQAYATAGGRLAGAAAG-- PH4H_Caulobacter_...
[9] --------------------------...----A-TV----------------- PH4H_Rhizobium_loti
Con --------------------------...L???A????-???G?????????-- Consensus 

msa documentation built on Nov. 8, 2020, 5:41 p.m.