These two functions compute two different types of statistics for the measure of statistical dinculeotide over- and under-representation : the rho statistic, and the z-score, each computed for all 16 dinucleotides.

1 2 |

`sequence` |
a vector of single characters. |

`wordsize` |
an integer giving the size of word (n-mer) to consider. |

`simulations` |
If |

`modele` |
A string of characters describing the model chosen for the random generation |

`exact` |
Whether exact analytical calculation or an approximation should be used |

`alphabet` |
A vector of single characters. |

`...` |
Optional parameters for specific model permutations are
passed on to |

The `rho`

statistic, as presented in Karlin S., Cardon LR. (1994), can
be computed on each of the 16 dinucleotides. It is the frequence of
dinucleotide *xy* divided by the product of frequencies of
nucleotide *x* and nucleotide *y*. It is equal to 1.00 when
dinucleotide *xy* is formed by pure chance, and it is superior
(respectively inferior) to 1.00 when dinucleotide *xy* is over-
(respectively under-) represented. Note that if you want to reproduce
Karlin's results you have to compute the statistic from the sequence
concatenated with its inverted complement that is with something
like `rho(c(myseq, rev(comp(mysed))))`

.

The `zscore`

statistic, as presented in Palmeira, L., Guéguen, L.
and Lobry JR. (2006). The statistic is the normalization of the
`rho`

statistic by its expectation and variance according to a
given random sequence generation model, and follows the
standard normal distribution. This statistic can be computed
with several models (cf. `permutation`

for the description
of each of the models). We provide analytical calculus for two of
them: the `base`

permutations model and the `codon`

permutations model.

The `base`

model allows for random sequence generation by
shuffling (with/without replacement) of all bases in the sequence.
Analytical computations are available for this model: either as an
approximation for large sequences (cf. Palmeira, L., Guéguen, L.
and Lobry JR. (2006)), either as the exact analytical formulae
(cf. Schbath, S. (1995)).

The `position`

model allows for random sequence generation
by shuffling (with/without replacement) of bases within their
position in the codon (bases in position I, II or III stay in
position I, II or III in the new sequence.

The `codon`

model allows for random sequence generation by
shuffling (with/without replacement) of codons. Analytical
computation is available for this model (Gautier, C., Gouy, M. and
Louail, S. (1985)).

The `syncodon`

model allows for random sequence generation
by shuffling (with/without replacement) of synonymous codons.

a table containing the computed statistic for each dinucleotide

L. Palmeira, J.R. Lobry with suggestions from A. Coghlan.

Gautier, C., Gouy, M. and Louail, S. (1985) Non-parametric statistics
for nucleic acid sequence study. *Biochimie*, **67**:449-453.

Karlin S. and Cardon LR. (1994) Computational DNA sequence analysis.
*Annu Rev Microbiol*, **48**:619-654.

Schbath, S. (1995) Étude asymptotique du nombre d'occurrences d'un
mot dans une chaîne de Markov et application à la recherche de mots
de fréquence exceptionnelle dans les séquences d'ADN.
*Thèse de l'Université René Descartes, Paris V*

Palmeira, L., Guéguen, L. and Lobry, J.R. (2006) UV-targeted dinucleotides
are not depleted in light-exposed Prokaryotic genomes.
*Molecular Biology and Evolution*,
**23**:2214-2219.
http://mbe.oxfordjournals.org/cgi/reprint/23/11/2214

`citation("seqinr")`

`permutation`

1 2 3 4 5 6 7 8 9 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.