This software package grew out of the work that I did to obtain my PhD.

If it is of help for your analysis, please cite

```
@Manual{,
title = {motifcounter: R package for analysing TFBSs in DNA sequences},
author = {Wolfgang Kopp},
year = {2017},
doi = {10.18129/B9.bioc.motifcounter}
}
```

A detailed description of the compound Poisson model is available in

```
@article{improvedcompound,
title={An improved compound Poisson model for the number of motif hits in DNA sequences},
author={Kopp, Wolfgang and Vingron, Martin},
journal={Bioinformatics},
pages={btx539},
year={2017},
publisher={Oxford University Press}
}
```

```
# Estimate a background model on a set of sequences
bg <- readBackground(sequences, order)
# Normalize a given PFM
new_motif <- normalizeMotif(motif)
# Evaluate the scores along a given sequence
scores <- scoreSequence(sequence, motif, bg)
# Evaluate the motif hits along a given sequence
hits <- motifHits(sequence, motif, bg)
# Evaluate the average score profile
score_profile <- scoreProfile(sequences, motif, bg)
# Evaluate the average motif hit profile
hit_profile <- motifHitProfile(sequences, motif, bg)
# Compute the motif hit enrichment
enrichment <- motifEnrichment(sequences, motif, bg)
```

`motifcounter`

The `motifcounter`

package facilitates the analysis of
transcription factor binding sites (TFBSs) in DNA sequences.
It can be used to scan a set of DNA sequences for known motifs
(e.g. from TRANSFAC or JASPAR) in order to determine the positions
and enrichment of TFBSs in the sequences.

Therefore, an analysis with `motifcounter`

requires as input
1. a position frequency matrix (PFM) which represents the TF affinity towards the DNA
2. a background model, which is estimated from a given DNA sequence and which
serves as a reference for the statistical analysis.
3. a desired false positive level, for identifying putative TFBSs in DNA sequences. For example, a reasonable choice would be to choose a false positive level such that only one in 1000 positions are called TFBSs falsely.
4. a given DNA sequence, which is subject to the TFBS analysis.

The package aims to improve motif hit enrichment analysis. To this end,
the package offers a number of features:
1. `motifcounter`

supports the use of **higher-order Markov models**
to account for the sequence composition in unbound DNA segments.
This improves the reliability of the enrichment analysis, because higher-order
sequence features occur commonly in natural DNA sequences (e.g. CpG islands).
2. The package automatically accounts for **self-overlapping** motif
structures1. This aspect is important
for reducing the false positives obtained from the enrichment test, which is
prevalent for repeat-like and palindromic motifs.
`motifcounter`

not only determines self-overlapping motif hit occurrences
on a single DNA strand, but (by default)
also with respect to the reverse strand.

`motifcounter`

implements two analytic approximations of the
*distribution of the number of motif hits*
in random DNA sequences that can optionally be used for the
enrichment test:

- A
*compound Poisson approximation* - A
*combinatorial approximation*

Both approximations yield highly accurate results for stringent
false positive levels.
Moreover, if you intend to analyse long DNA sequences or
a large set of individual sequences (total sequence length >10kb),
we recommend to use the *compound Poisson approximation*.
On the other hand, we recommend the *combinatorial approximation*
if a relaxed false positive level is prefered to identify TFBSs.

An easy way to install `motifcounter`

is by facilitating
the `devtools`

R package.

```
#install.packages("devtools")
library(devtools)
install_github("wkopp/motifcounter", build_vignettes=TRUE)
```

Alternatively, the package can also be cloned or
downloaded from this github-rep,
built via `R CMD build`

and installed via the `R CMD INSTALL`

command.

The `motifcounter`

package contains a tutorial that illustrates:
1. how to determine position- and strand-specific TF motif binding sites,
2. how to analyse the profile of motif hit occurrences across a set of
aligned sequences, and
3. how to test for motif enrichment in a given set of sequences.

The tutorial can be found in the package-vignette:

```
library(motifcounter)
vignette("motifcounter")
```

Thanks to matthuska for reviewing and commenting on the package.

1: Self-overlapping motifs induce**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.