This function approximates the distribution of the clump sizes.
clumpSizeDist(maxclump, overlap, method = "kopp")
Maximal clump size
An Overlap object.
String that defines which method shall be invoked: 'pape' or 'kopp' (see description). Default: method = 'kopp'.
The clump size distribution can be determined in two alternative ways:
A re-implemented version of the algorithm that was described in Pape et al. Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. 2008 can be invoked using method='pape'.
An improved approximation of the clump size distribution uses more appropriate statistical assumptions concerning overlapping motif hits and that can be used with order-d background models as well. The improved version is used by default with method='kopp'.
Distribution of the clump size
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::clumpSizeDist(20, op)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.