countSpliceKmers: countSpliceKmers: Counting K-mers on donor (5', upstream)...

Description Usage Arguments Details Value Author(s) Examples

View source: R/kMer.R

Description

The function regards the given string as DNA sequence bearing a collection of splice sites. The given lEnd and rStart positions act as (1-based) coordinates of the innermost exonic nucleotides. They reside on exon-intron boundaries and have one exonic and one intronic adjacent nucleotide. The function counts width k-mers upstream on exonic DNA in reading direction (left -> right on (+) strand, right -> left on (-) strand).

Usage

1
countSpliceKmers(dna, seqid, lEnd, rStart, width, strand, k)

Arguments

dna

character. Vector of DNA sequences. dna must not contain other characters than "ATCGN". Capitalization does not matter. When a 'N' character is found, the current DNA k-mer is skipped.

seqid

numeric. Vector of (1-based) values coding for one of the given sequences.

lEnd

numeric. Vector of (1-based) left-end positions. Will be used as rightmost window position.

rStart

numeric. Vector of (1-based) right-start positions. Will be used as leftmost window positions (over which(n-1) positions overhang will be counted as part of frame).

width

numeric. Vector of window width values.

strand

factor or numeric. First factor level (or numeric: 1) value will be interpreted as (+) strand For any other values, the reversed complement sequence will be counted (in left direction from start value). For (+) strand, the lEnd value will be used as starting position. For (-) strand, the rStart position will be used as starting positions.

k

numeric. Number of nucleotides in tabled DNA motifs. Only a single value is allowed (length(n) = 1 !)

Details

The function returns a matrix. Each colum contains the motif-count values for one frame. Each row represents one DNA motif. The DNA sequence of the DNA motif is given as row.name.

Value

matrix.

Author(s)

Wolfgang Kaisers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
seq <- "acgtGTccccAGcccc"
countSpliceKmers(seq, seqid=1, lEnd=4, rStart=10, width=2, strand=1, k=3)
#
sq1 <- "TTTTTCCCCGGGGAAAA"
sq2 <- "TTTTTTTCCCCGGGGAAAA"
sq <- c(sq1, sq2)
seqid <- c( 1, 1, 2, 2)
lEnd  <- c( 9, 9, 11, 11)
rStart <- c(14, 14, 16, 16)
width <- c( 4, 4, 4, 4)
strand <- c( 1, 0, 1, 0)
countSpliceKmers(sq, seqid, lEnd, rStart, width, strand, k=2)

Example output

Loading required package: zlibbioc
[count_splice_Kmers] Finished. char mismatches: 0 on (+) and 0 on (-) strand.
    1
AAA 0
AAC 0
AAG 0
AAT 0
ACA 0
ACC 0
ACG 1
ACT 0
AGA 0
AGC 0
AGG 0
AGT 0
ATA 0
ATC 0
ATG 0
ATT 0
CAA 0
CAC 0
CAG 0
CAT 0
CCA 0
CCC 0
CCG 0
CCT 0
CGA 0
CGC 0
CGG 0
CGT 1
CTA 0
CTC 0
CTG 0
CTT 0
GAA 0
GAC 0
GAG 0
GAT 0
GCA 0
GCC 0
GCG 0
GCT 0
GGA 0
GGC 0
GGG 0
GGT 0
GTA 0
GTC 0
GTG 0
GTT 0
TAA 0
TAC 0
TAG 0
TAT 0
TCA 0
TCC 0
TCG 0
TCT 0
TGA 0
TGC 0
TGG 0
TGT 0
TTA 0
TTC 0
TTG 0
TTT 0
[count_splice_Kmers] Finished. char mismatches: 0 on (+) and 0 on (-) strand.
   1 2 3 4
AA 0 0 0 0
AC 0 0 0 0
AG 0 0 0 0
AT 0 0 0 0
CA 0 0 0 0
CC 3 0 3 0
CG 0 0 0 0
CT 0 0 0 0
GA 0 0 0 0
GC 0 0 0 0
GG 0 0 0 0
GT 0 0 0 0
TA 0 0 0 0
TC 1 1 1 1
TG 0 0 0 0
TT 0 3 0 3

seqTools documentation built on Nov. 8, 2020, 5:20 p.m.