simulateClumpSizeDist: Empirical clump size distribution

Description Usage Arguments Value See Also Examples

View source: R/simulate_wrapper.R

Description

This function repeatedly simulates random DNA sequences according to the background model and subsequently counts the number of k-clump occurrences, where denotes the clump size. This function is only used for benchmarking analysis.

Usage

1
simulateClumpSizeDist(pfm, bg, seqlen, nsim = 10, singlestranded = FALSE)

Arguments

pfm

An R matrix that represents a position frequency matrix

bg

A Background object

seqlen

Integer-valued vector that defines the lengths of the individual sequences. For a given DNAStringSet, this information can be retrieved using numMotifHits.

nsim

Integer number of random samples.

singlestranded

Boolean that indicates whether a single strand or both strands shall be scanned for motif hits. Default: singlestranded = FALSE.

Value

A List that contains

dist

Empirical distribution of the clump sizes

See Also

compoundPoissonDist,combinatorialDist

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Load sequences
seqfile = system.file("extdata", "seq.fasta", package = "motifcounter")
seqs = Biostrings::readDNAStringSet(seqfile)

# Load background
bg = readBackground(seqs, 1)

# Load motif
motiffile = system.file("extdata", "x31.tab", package = "motifcounter")
motif = t(as.matrix(read.table(motiffile)))

# Study the clump size frequencies in one sequence of length 1 Mb
seqlen = 1000000

# scan both strands
simc = motifcounter:::simulateClumpSizeDist(motif, bg, seqlen)

# scan a single strand
simc = motifcounter:::simulateClumpSizeDist(motif, bg,
    seqlen, singlestranded = TRUE)

motifcounter documentation built on Nov. 8, 2020, 5:44 p.m.