# Matrix Test of CSPR for Mononucleotides

### Description

Performs a test of Chargaff's second parity rule (CSPR) for mononucleotides on a genomic sequence using a 4X4 stochastic matrix estimated from the sequence. The test was proposed by Hart and Mart<ed>nez (2011).

### Usage

1 | ```
chargaff1.test(x, alg=c("table", "simulate", "upper"), n, no.p.value=FALSE)
``` |

### Arguments

`x` |
either a vector containing the relative frequencies of each of the 4 nucleotides A, C, G, T, a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package. |

`alg` |
the algorithm for computing the p-value. If set to “simulate”, the p-value is obtained via Monte Carlo simulation. If set to “upper”, an analytic upper bound on the p-value is computed. “upper” are based on formulae in Hart and Mart<ed>nez (2011). If type is specified as “table” (the default value),the p-value for the test is obtained from a linear interpolation of a look-up table. See the note below for further details. |

`n` |
The number of replications to use for Monte Carlo simulation. If computationally feasible, a value >= 10000000 is recommended. |

`no.p.value` |
If TRUE, do not compute the p-value. The default is FALSE. |

### Details

The first argument may be a character vector representing a DNA sequence, a DNA sequence represented using the SeqFastadna class from the seqinr package, or a vector containing the relative frequencies of the A, C, G and T nucleic acids.

This function performs a test of Chargaff's second parity rule for
mononucleotides based on a 4X4 stochastic matrix *P* estimated from the
empirical dinucleotide distribution of a genomic sequence . The *a,b)*
entry of *P* gives the empirical probability (relative frequency) that a
nucleotide *a* is followed by a nucleotide *b* in the sequence. The test
is set up as follows:

*H0*: the sequence (or matrix *P*) does not comply with CSPR for mononucleotides

*H1*: the sequence (or matrix *P*) complies with CSPR for mononucleotides

### Value

A list with class "htest.ext" containing the following components:

`statistic` |
the value of the test statistic. |

`p.value` |
the p-value of the test. Only included if no.p.value is FALSE. |

`method` |
a character string indicating what type of test was performed. |

`data.name` |
a character string giving the name of the data. |

`f` |
the 2-element vector used in calculating the test statistic. |

`estimate` |
the stochastic matrix |

`stat.desc` |
a brief description of the test statistic. |

`null` |
the null hypothesis ( |

`alternative` |
the alternative hypothesis ( |

### Note

Currently, the look-up table that is employed when alg is set to “table” does not provide an accurate p-value when the statistic is smaller than 0.00806. Care should be taken when adjusting p-values for multiple testing.

The algebraically computed upper bound on the p-value obtained when alg is set to “upper” is not very tight and not suitable for real- world applications.

no.p.value suppresses computation of the p-value when it is set to TRUE. This may be useful wen using this function to help simulate the test statistic.

### Author(s)

Andrew Hart and Servet Mart<ed>nez

### References

Hart, A.G. and Mart<ed>nez, S. (2011)
Statistical testing of Chargaff's second parity rule in bacterial genome sequences.
*Stoch. Models* **27(2)**, 1–46.

### See Also

`chargaff0.test`

, `chargaff2.test`

,
`agct.test`

, `ag.test`

,
`chargaff.gibbs.test`

### Examples

1 2 3 4 5 6 7 8 9 | ```
#Demonstration on real bacterial sequence
data(nanoarchaeum)
chargaff1.test(nanoarchaeum)
#Simulate synthetic DNA sequence that does not satisfy Chargaff's second parity rule
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3),
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))
chargaff1.test(seq)
``` |