simulateClumpSizeDist: Empirical clump size distribution
In motifcounter: R package for analysing TFBSs in DNA sequences

Description Usage Arguments Value See Also Examples

This function repeatedly simulates random DNA sequences according to the background model and subsequently counts the number of k-clump occurrences, where denotes the clump size. This function is only used for benchmarking analysis.

1	simulateClumpSizeDist(pfm, bg, seqlen, nsim = 10, singlestranded = FALSE)

`pfm`	An R matrix that represents a position frequency matrix
`bg`	A Background object
`seqlen`	Integer-valued vector that defines the lengths of the individual sequences. For a given DNAStringSet, this information can be retrieved using `numMotifHits`.
`nsim`	Integer number of random samples.
`singlestranded`	Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE.

A List that contains

dist: Empirical distribution of the clump sizes

compoundPoissonDist,combinatorialDist

# Load sequences
seqfile = system.file("extdata", "seq.fasta", package = "motifcounter")
seqs = Biostrings::readDNAStringSet(seqfile)

# Load background
bg = readBackground(seqs, 1)

# Load motif
motiffile = system.file("extdata", "x31.tab", package = "motifcounter")
motif = t(as.matrix(read.table(motiffile)))

# Study the clump size frequencies in one sequence of length 1 Mb
seqlen = 1000000

# scan both strands
simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen)

# scan a single strand
simc = motifcounter:::simulateClumpSizeDist(motif, bg,
    seqlen, singlestranded = TRUE)