This method computes a consensus sequence from a multiple alignment or a previously computed consensus matrix. Currently, two different ways of these computations are available.
1 2 3 4 5
an object of class
a character string specifying how to compute the consensus
sequence. Currently, types
a decreasing two-element numeric vector of numbers
between 0 and 100 corresponding to the two conservation thresholds.
Only relevant for
a logical (default:
when the method is called for a
The method takes a
MultipleAlignment object or a
previously computed consensus matrix and computes a consensus
type="Biostrings", the method
consensusString from the Biostrings package is
used to compute the consensus sequence. For
two thresholds (argument
thresh, see above) are used to
compute the consensus sequence:
If the relative frequency of the most frequent letter at a given position is at least as large as the first threshold (default: 80%), then this most frequent letter is used for the consensus sequence at this position as it is.
If the relative frequency of the most frequent letter at a given position is smaller than the first threshold, but at least as large as the second threshold (default: 20%), then this most frequent letter is used for the consensus sequence at this position, but converted to lower case.
If the relative frequency of the most frequent letter in a column is even smaller than the second threshold, then a dot is used for the consensus sequence at this position.
ignoreGaps=FALSE (which is the default),
gaps are treated like all other
letters except for the fact that obviously no lowercase conversion
takes place in case that the relative frequency is between the
two thresholds. If
ignoreGaps=TRUE, gaps are ignored
completely, and the consensus sequence is computed from the
non-gap letters only.
If the consensus matrix of a multiple alignment of nucleotide sequences contains rows labeled ‘+’ and/or ‘.’, these rows are ignored.
The function returns a character string with the consensus sequence.
Ulrich Bodenhofer <[email protected]>
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## read sequences filepath <- system.file("examples", "HemoglobinAA.fasta", package="msa") mySeqs <- readAAStringSet(filepath) ## perform multiple alignment myAlignment <- msa(mySeqs) ## regular consensus sequence using consensusString() method from the ## 'Biostrings' package msaConsensusSequence(myAlignment) ## use the other method msaConsensusSequence(myAlignment, type="upperlower") ## use the other method with custom parameters msaConsensusSequence(myAlignment, type="upperlower", thresh=c(50, 20), ignoreGaps=TRUE) ## compute a consensus matrix first conMat <- consensusMatrix(myAlignment) msaConsensusSequence(conMat)
Loading required package: Biostrings Loading required package: BiocGenerics Loading required package: parallel Attaching package: 'BiocGenerics' The following objects are masked from 'package:parallel': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from 'package:stats': IQR, mad, sd, var, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, append, as.data.frame, cbind, colMeans, colSums, colnames, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Loading required package: S4Vectors Loading required package: stats4 Attaching package: 'S4Vectors' The following object is masked from 'package:base': expand.grid Loading required package: IRanges Loading required package: XVector Attaching package: 'Biostrings' The following object is masked from 'package:base': strsplit use default substitution matrix  "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"  "-vLsaadKtnvkaawgkvgghageygaEaLeRmflsfPtTKTYFphf-dlshgSaqvkghGkkvadAlt.AvahlddlpgalsaLSdLHAhkLrVDPvNFklLshcllVtla.hhpadftPavhaslDKFlasvstvLtskYR"  ".TKRn.CIsMTI..VfItFfGffDWFfD.KDQLEkrENSSISWENGE.CKRGFR.PTIfGFIIT.C.KSd.TfGkCCkNf.KR.KRCKG.GIKQTCNTMEIKKRGAKKTSK.rGgN.cESNdTG.RKCIEK.rTRSTKSRIWQ"  "-VLS?ADK?NVKA?WGK?GGHA?EYGAEALERMF?SFPTTKTYFPHF-DLSHGSAQVKGHGKKVADALT?AV?H?DDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA?H?PA?FTPAVHASLDKFLA?VSTVLTSKYR"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.