getKmerFreq: Calculate observed and expected k-mer frequencies

View source: R/motif_enrichment_kmers.R

getKmerFreqR Documentation

Calculate observed and expected k-mer frequencies

Description

Given a set of sequences, calculate observed and expected k-mer frequencies. Expected frequencies are based on a Markov model of order MMorder.

Usage

getKmerFreq(
  seqs,
  kmerLen = 5,
  MMorder = 1,
  pseudocount = 1,
  zoops = TRUE,
  strata = rep(1L, length(seqs)),
  p.adjust.method = "BH",
  includeRevComp = TRUE
)

Arguments

seqs

Set of sequences, either a character vector or a DNAStringSet.

kmerLen

A numeric scalar giving the k-mer length.

MMorder

A numeric scalar giving the order of the Markov model used to calculate the expected frequencies.

pseudocount

A numeric scalar - will be added to the observed counts for each k-mer to avoid zero values.

zoops

A logical scalar. If TRUE (the default), only one or zero occurences of a k-mer are considered per sequence.

strata

A factor or a numeric scalar defining the strata of sequences. A separate Markov model and expected k-mer frequencies are estimated for the set of sequences in each stratum (level in a strata factor). If strata is a scalar value, it will be interpreted as the number of strata to split the sequences into according to their CpG observed-over-expected counts using kmeans(CpGoe, centers = strata).

p.adjust.method

A character scalar selecting the p value adjustment method (used in p.adjust).

includeRevComp

A logical scalar. If TRUE (default), count k-mer occurrences in both seqs and their reverse-complement, by concatenating seqs and their reverse-complemented versions before the counting. This is useful if motifs can be expected to occur on any strand (e.g. DNA sequences of ChIP-seq peaks). If motifs are only expected on the forward strand (e.g. RNA sequences of CLIP-seq peaks), includeRevComp = FALSE should be used. Note that if strata is a vector of the same length as seqs, each reverse-complemented sequence will be assigned to the same stratum as the forward sequence.

Value

A list with observed and expected k-mer frequencies (freq.obs and freq.exp, respectively), and enrichment statistics for each k-mer.

Examples

res <- getKmerFreq(seqs = c("AAAAATT", "AAATTTT"), kmerLen = 3)
names(res)
head(res$freq.obs)
head(res$freq.exp)


fmicompbio/monaLisa documentation built on Nov. 2, 2024, 1:33 p.m.