countGenomeKmers: countGenomeKmers: Counting K-mers in DNA sequences.

Description Usage Arguments Details Value Author(s) Examples

Description

Counts K-mers of DNA sequences inside fasta files. The k-mers are searched in a set of search windows, which are defined by start and width parameter. From each position of the search window, a DNA k-mer is identified on the right hand side on the given DNA sequence. Each value in the start vector defindes the left border of a search window. The size of the search window is given by the appropriate value in the width vector. The function is intended to count DNA k-mers in selected regions (e.g. exons) on DNA chromosomes while respecting strand orientation.

Usage

1
countGenomeKmers(dna,seqid,start,width,strand,k)

Arguments

dna

character. Vector of DNA sequences. dna must not contain other characters as "ATCGN". Capitalization does not matter. When a 'N' character is found, the current DNA k-mer is skipped.

seqid

numeric. Vector of (1-based) values describing the index of the analyzed sequences inside the fasta file.

start

numeric. Vector of (1-based) start positions for reading windows.

width

numeric. Vector of window width values.

strand

factor or numeric. First factor level (or numeric: 1) value will be interpreted as (+)-strand. For any other values, the reversed complement sequence will be counted (in left direction from start value).

k

numeric. Number of nucleotides in tabled DNA motifs. Only a single value is allowed (length(n)=1!)

Details

The function returns a matrix. Each colum containts the motif-count values for one frame. Each row represents one DNA motif. The DNA sequence of the DNA motif is given as row.name.

Value

matrix.

Author(s)

Wolfgang Kaisers

Examples

1
2
3
4
5
6
7
sq<-"TTTTTCCCCGGGGAAAA"
seqid <-as.integer(c(1, 1))
start<-as.integer(c(6,14))
width <-as.integer(c(4, 4))
strand<-as.integer(c(1, 0))
k<-2
countGenomeKmers(sq,seqid,start,width,strand,k)

Example output

Loading required package: zlibbioc
   1 2
AA 0 0
AC 0 0
AG 0 0
AT 0 0
CA 0 0
CC 3 0
CG 1 0
CT 0 0
GA 0 0
GC 0 0
GG 0 0
GT 0 0
TA 0 0
TC 0 1
TG 0 0
TT 0 3

seqTools documentation built on May 2, 2019, 4:45 p.m.