GetConsensusSeq: Function for obtaining consensus sequence of DNA sequence...

Description Usage Arguments Value Examples

View source: R/GetConsensusSeq.R

Description

uses a special nomenclature (we call it the Logolas nomenclature) to determine the consensus sequence of symbols based on the enrichment and depletion of the symbols at each position. This approach is an alternative to the getIUPAC() method used by the atSNP package.

Usage

1

Arguments

data

The input data may be a vector of A, C, G and T sequences - representing aligned DNA or RNA sequences , or a matrix/ data frame with symbols of A, C, G and T along the rows of the matrix/data frame and the positions or sites of the aligned sequences along the columns.

Value

Returns the consensus sequence for the DNA/RNA sequence motif along the positions using the Logolas nomenclature (highlighting both enrichment and depletion).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
pwm=matrix(c(0.8,0.1,0.1,0,
0.9,0.1,0,0,0.9,0.05,0.05,0,0.5,
0.4,0,0.1,0.6,0.4,0,0,0.4,0.4,0.1,
0.1,0.5,0,0.2,0.3,0.35,0.35,0.06,
0.24,0.4,0.3,0.2,0.1,0.4,0.2,0.2,
0.2,0.28,0.24,0.24,0.24,0.5,0.16,0.17,
0.17,0.6,0.13,0.13,0.14,0.7,0.15,0.15,0),
nrow = 4,byrow = FALSE)
rownames(pwm)=c('A','C','G','T')
colnames(pwm)=1:ncol(pwm)
GetConsensusSeq(pwm)

sequence <- c("CTATTGT", "CTCTTAT", "CTATTAA", "CTATTTA", "CTATTAT", 
              "CTTGAAT", "CTTAGAT", "CTATTAA", "CTATTTA", "CTATTAT")
GetConsensusSeq(sequence)

Logolas documentation built on April 28, 2020, 8:55 p.m.