msaConsensusSequence-methods: Computation of Consensus Sequence from Multiple Alignment
In msa: Multiple Sequence Alignment

Description Usage Arguments Details Value Author(s) References See Also Examples

This method computes a consensus sequence from a multiple alignment or a previously computed consensus matrix. Currently, two different ways of these computations are available.

## S4 method for signature 'matrix'
msaConsensusSequence(x, type=c("Biostrings", "upperlower"),
    thresh=c(80, 20), ignoreGaps=FALSE, ...)
## S4 method for signature 'MultipleAlignment'
msaConsensusSequence(x, ...)

`x`	an object of class `MultipleAlignment` (which includes objects of classes `MsaAAMultipleAlignment`, `MsaDNAMultipleAlignment`, and `MsaRNAMultipleAlignment`) or a previously computed consensus matrix (see details below).
`type`	a character string specifying how to compute the consensus sequence. Currently, types `"Biostrings"` and `"upperlower"` are available (see details below).
`thresh`	a decreasing two-element numeric vector of numbers between 0 and 100 corresponding to the two conservation thresholds. Only relevant for `type="upperlower"` (see details below), otherwise ignored.
`ignoreGaps`	a logical (default: `FALSE`) indicating whether gaps should be considered when computing the consensus sequence. Only relevant for `type="upperlower"` (see details below), otherwise ignored.
`...`	when the method is called for a `MultipleAlignment` object, the consensus matrix is computed and, including all further arguments, passed passed on to the `msaConsensusSequence` method with signature `matrix`. The method with signature `matrix` forwards additional arguments to the `consensusString` method from the Biostrings package if `type="Biostrings"`.

The method takes a MultipleAlignment object or a previously computed consensus matrix and computes a consensus sequence. For type="Biostrings", the method consensusString from the Biostrings package is used to compute the consensus sequence. For type="upperlower", two thresholds (argument thresh, see above) are used to compute the consensus sequence:

If the relative frequency of the most frequent letter at a given position is at least as large as the first threshold (default: 80%), then this most frequent letter is used for the consensus sequence at this position as it is.
If the relative frequency of the most frequent letter at a given position is smaller than the first threshold, but at least as large as the second threshold (default: 20%), then this most frequent letter is used for the consensus sequence at this position, but converted to lower case.
If the relative frequency of the most frequent letter in a column is even smaller than the second threshold, then a dot is used for the consensus sequence at this position.
If ignoreGaps=FALSE (which is the default), gaps are treated like all other letters except for the fact that obviously no lowercase conversion takes place in case that the relative frequency is between the two thresholds. If ignoreGaps=TRUE, gaps are ignored completely, and the consensus sequence is computed from the non-gap letters only.

If the consensus matrix of a multiple alignment of nucleotide sequences contains rows labeled ‘+’ and/or ‘.’, these rows are ignored.

The function returns a character string with the consensus sequence.

Ulrich Bodenhofer <msa@bioinf.jku.at>

http://www.bioinf.jku.at/software/msa

U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.

msa, MsaAAMultipleAlignment, MsaDNAMultipleAlignment, MsaRNAMultipleAlignment, MsaMetaData, MultipleAlignment, consensusString

## read sequences
filepath <- system.file("examples", "HemoglobinAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)

## perform multiple alignment
myAlignment <- msa(mySeqs)

## regular consensus sequence using consensusString() method from the
## 'Biostrings' package
msaConsensusSequence(myAlignment)

## use the other method
msaConsensusSequence(myAlignment, type="upperlower")

## use the other method with custom parameters
msaConsensusSequence(myAlignment, type="upperlower", thresh=c(50, 20),
                     ignoreGaps=TRUE)

## compute a consensus matrix first
conMat <- consensusMatrix(myAlignment)
msaConsensusSequence(conMat)

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

use default substitution matrix
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"
[1] "-vLsaadKtnvkaawgkvgghageygaEaLeRmflsfPtTKTYFphf-dlshgSaqvkghGkkvadAlt.AvahlddlpgalsaLSdLHAhkLrVDPvNFklLshcllVtla.hhpadftPavhaslDKFlasvstvLtskYR"
[1] ".TKRn.CIsMTI..VfItFfGffDWFfD.KDQLEkrENSSISWENGE.CKRGFR.PTIfGFIIT.C.KSd.TfGkCCkNf.KR.KRCKG.GIKQTCNTMEIKKRGAKKTSK.rGgN.cESNdTG.RKCIEK.rTRSTKSRIWQ"
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"