countDnaKmers: countDnaKmers: Counting k-mers in DNA sequence.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/kMer.R

Description

Counts occurrence of DNA k-mers in given DNA sequence. The k-mers are searched in a set of search windows, which are defined by start and width parameter. From each position of the search window, a DNA k-mer is identified on the right hand side on the given DNA sequence. Each value in the start vector defines the left border of a search window. The size of the search window is given by the appropriate value in the width vector. The function is intended to count DNA k-mers in selected regions (e.g. exons) on DNA sequence.

Usage

1

Arguments

dna

character. Single DNA sequence (vector of length 1). dna must not contain other characters than "ATCGN". Capitalization does not matter. When a 'N' character is found, the current DNA k-mer is skipped.

k

numeric. Number of nucleotides in tabled DNA motifs.

start

numeric. Vector of (1-based) start positions for reading frames. Reading frame is counted to the right side of the DNA string.

width

numeric. Defines size of search window for each start position. Must have the same length as start or length 1 (in which case the values of width are recycled.

Details

The start positions for counting of DNA k-mers are all positions in {start,...,start+width-1}. As the identification of a DNA k-mer scans a sequence window of size k, the last allowed start position counting a k-mer is nchar(dna)-k+1. The function throws the error 'Search region exceeds string end' when a value start + width + k > nchar(dna) + 2 occurs.

Value

matrix. Each colum contains the motif-count values for one frame. The column names are the values in the start vector. Each row represents one DNA motif. The DNA sequence of the DNA motif is given as row.name.

Author(s)

Wolfgang Kaisers

See Also

countGenomeKmers

Examples

1
2
seq <- "ATAAATA"
countDnaKmers(seq, 2, 1:3, 3)

Example output

Loading required package: zlibbioc
   1 2 3
AA 1 2 2
AC 0 0 0
AG 0 0 0
AT 1 0 1
CA 0 0 0
CC 0 0 0
CG 0 0 0
CT 0 0 0
GA 0 0 0
GC 0 0 0
GG 0 0 0
GT 0 0 0
TA 1 1 0
TC 0 0 0
TG 0 0 0
TT 0 0 0

seqTools documentation built on Nov. 8, 2020, 5:20 p.m.