consmat2seq: Get the consensus sequence from a consensus matrix

Description Usage Arguments Value Examples

View source: R/consmat2seq.R

Description

This function is a simplified version of the consensusString function that:

  1. only allows bases A, T, G, C and possibly a single ambiguity letter (default to N)

  2. does not use the IUPAC ambiguity codes in the input and output

  3. prioritizes the insertion of a gap when the % of sequence with a gap is >threshold

  4. uses only the non gap sequences to evaluate the % of each letter at a given position

Note that the function will not work if one of the base A, T, G or C or if the gap "-" is completely absent from the alignment. Surprisingly, the consensusString function does not give identical results when used on a DNAalignment object or a frequency matrix given by consensusMatrix with prob=TRUE. Using the default values, this function will give the same result as consensusString(DNAalignment-object, ambiguityMap="N", threshold=0.5).

Usage

1
consmat2seq(x, ambiguityLetter = "N", threshold = 0.5)

Arguments

x

a consensus matrix (i.e. a matrix of integers) obtained generally using the consensusMatrix function.

ambiguityLetter

Letter used when there is an ambiguity. (Default is "N")

threshold

% above which a base (or a gap) is selected as the consensus

Value

A character string

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Create a function to compare the consmat2seq and the consensusString functions:
ccons <- function(intmat) {
    print(Biostrings::consensusString(intmat / colSums(intmat),
                                      "N", threshold = 0.5))
    print(NanoBAC:::consmat2seq(intmat, "N", threshold = 0.5))
    }
# Create a simple matrix:
consmat <- matrix(c(10L, 0L, 0L, 0L, 0L, 0L,
                    0L, 10L, 0L, 0L, 0L, 0L,
                    0L, 0L, 10L, 0L, 0L, 0L,
                    0L, 0L, 0L, 10L, 0L, 0L),
                    nrow = 6,
                    dimnames = list(c("A", "T", "G", "C", "N", "-")))
ccons(consmat) # same result
consmat[,3] <- c(4L, 0L, 0L, 0L, 0L, 6L)
ccons(consmat) # same
consmat[,3] <- c(4L, 1L, 0L, 0L, 0L, 5L)
ccons(consmat) # same
consmat[,3] <- c(5L, 0L, 0L, 0L, 0L, 5L)
ccons(consmat) # different. (favor the gap)
consmat[,3] <- c(4L, 2L, 0L, 0L, 0L, 4L)
ccons(consmat) #different (only consider letters)
consmat[,3] <- c(4L, 2L, 1L, 1L, 0L, 2L)
ccons(consmat) #different (only consider letters)
consmat[,3] <- c(4L, 2L, 1L, 1L, 1L, 1L)
ccons(consmat) #same

pgpmartin/NanoBAC documentation built on Dec. 11, 2020, 9:51 a.m.