msaConsensusSequence-methods: Computation of Consensus Sequence from Multiple Alignment

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This method computes a consensus sequence from a multiple alignment or a previously computed consensus matrix. Currently, two different ways of these computations are available.

Usage

1
2
3
4
5
## S4 method for signature 'matrix'
msaConsensusSequence(x, type=c("Biostrings", "upperlower"),
    thresh=c(80, 20), ignoreGaps=FALSE, ...)
## S4 method for signature 'MultipleAlignment'
msaConsensusSequence(x, ...)

Arguments

x

an object of class MultipleAlignment (which includes objects of classes MsaAAMultipleAlignment, MsaDNAMultipleAlignment, and MsaRNAMultipleAlignment) or a previously computed consensus matrix (see details below).

type

a character string specifying how to compute the consensus sequence. Currently, types "Biostrings" and "upperlower" are available (see details below).

thresh

a decreasing two-element numeric vector of numbers between 0 and 100 corresponding to the two conservation thresholds. Only relevant for type="upperlower" (see details below), otherwise ignored.

ignoreGaps

a logical (default: FALSE) indicating whether gaps should be considered when computing the consensus sequence. Only relevant for type="upperlower" (see details below), otherwise ignored.

...

when the method is called for a MultipleAlignment object, the consensus matrix is computed and, including all further arguments, passed passed on to the msaConsensusSequence method with signature matrix. The method with signature matrix forwards additional arguments to the consensusString method from the Biostrings package if type="Biostrings".

Details

The method takes a MultipleAlignment object or a previously computed consensus matrix and computes a consensus sequence. For type="Biostrings", the method consensusString from the Biostrings package is used to compute the consensus sequence. For type="upperlower", two thresholds (argument thresh, see above) are used to compute the consensus sequence:

If the consensus matrix of a multiple alignment of nucleotide sequences contains rows labeled ‘+’ and/or ‘.’, these rows are ignored.

Value

The function returns a character string with the consensus sequence.

Author(s)

Ulrich Bodenhofer <msa@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/msa

U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.

See Also

msa, MsaAAMultipleAlignment, MsaDNAMultipleAlignment, MsaRNAMultipleAlignment, MsaMetaData, MultipleAlignment, consensusString

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## read sequences
filepath <- system.file("examples", "HemoglobinAA.fasta", package="msa")
mySeqs <- readAAStringSet(filepath)

## perform multiple alignment
myAlignment <- msa(mySeqs)

## regular consensus sequence using consensusString() method from the
## 'Biostrings' package
msaConsensusSequence(myAlignment)

## use the other method
msaConsensusSequence(myAlignment, type="upperlower")

## use the other method with custom parameters
msaConsensusSequence(myAlignment, type="upperlower", thresh=c(50, 20),
                     ignoreGaps=TRUE)

## compute a consensus matrix first
conMat <- consensusMatrix(myAlignment)
msaConsensusSequence(conMat)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

use default substitution matrix
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"
[1] "-vLsaadKtnvkaawgkvgghageygaEaLeRmflsfPtTKTYFphf-dlshgSaqvkghGkkvadAlt.AvahlddlpgalsaLSdLHAhkLrVDPvNFklLshcllVtla.hhpadftPavhaslDKFlasvstvLtskYR"
[1] ".TKRn.CIsMTI..VfItFfGffDWFfD.KDQLEkrENSSISWENGE.CKRGFR.PTIfGFIIT.C.KSd.TfGkCCkNf.KR.KRCKG.GIKQTCNTMEIKKRGAKKTSK.rGgN.cESNdTG.RKCIEK.rTRSTKSRIWQ"
[1] "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"

msa documentation built on Nov. 8, 2020, 5:41 p.m.