# Test of Purine-Pyrimidine Parity Based on Euclidean distance

### Description

Performs a test proposed by Hart and Mart<ed>nez (2011) for the equivalence of the
relative frequencies of purines (*A+G*) and pyrimidines (*C+T*) in DNA
sequences. It does this by checking whether or not the mononucleotide
frequencies of a DNA sequence satisfy the relationship A+G=C+T.

### Usage

1 |

### Arguments

`x` |
either a vector containing the relative frequencies of each of the 4 nucleotides A, C, G, T, a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package. |

`alg` |
the algorithm for computing the p-value. If set to “simulate”, the p-value is obtained via Monte Carlo simulation. If set to “lower”, an analytic lower bound on the p-value is computed. If set to “upper”, an analytic upper bound on the p-value is computed. “lower” and “upper” are based on formulae in Hart and Mart<ed>nez (2011). a Tighter (though unpublished) lower bound on the p-value may be obtained by specifying “Lower”. If alg is specified as “exact” (the default value), the p-value for the test is computed exactly. |

`n` |
The number of replications to use for Monte Carlo simulation. If computationally feasible, a value >= 10000000 is recommended. |

### Details

The first argument may be a character vector representing a DNA sequence, a DNA sequence represented using the SeqFastadna class from the seqinr package, or a vector containing the relative frequencies of the A, C, G and T nucleic acids.

Let A, C, G and T denote the relative frequencies of the nucleotide bases
appearing in a DNA sequence. This function carries out a statistical hypothesis
test that the relative frequencies satisfy the relation *A+G=C+T*, or that
purines *{A,G}* occur equally as often as pyrimidines *{C,T}* in a DNA sequence.
The relationship can be rewritten as *A-T=C-G*, from which it is easy to see
that the property being tested is a generalisation of Chargaff's second parity
rule for mononucleotides, which states that *A=T* and *C=G*. The test is
set up as follows:

*H0*: *A+G != C+T*

*H1*: *A+G = C+T*

The vector *(A,C,G,T)* is assumed to come from a Dirichlet(1,1,1,1)
distribution on the 3-simplex under the null hypothesis.

The test statistic *etaV* is the Euclidean distance from the
relative frequency vector *(A,C,G,T)* to the closest point in the square set
*thetaV = {(x,y,1/2-x,1/2-
y) : 0 <= x,y <= 1/2}*, which divides the 3-simplex into two equal parts.
*etaV* lies in the range *[0,sqrt(3/8)]*.

### Value

A list with class "htest.ext" containing the following components:

`statistic` |
the value of the test statistic. |

`p.value` |
the p-value of the test. |

`method` |
a character string indicating what type of test was performed. |

`data.name` |
a character string giving the name of the data. |

`estimate` |
the probability vector used to derive the test statistic. |

`stat.desc` |
a brief description of the test statistic. |

`null` |
the null hypothesis ( |

`alternative` |
the alternative hypothesis ( |

### Note

`agct.test(x, alg="upper")`

is equivalent to ```
ag.test(x,
alg="simplex")
```

except that the p-value computed using the formula for
alg="upper" is exact for the test statistic *etaV** used in
`ag.test`

, whereas it is merely an upper bound on the p-value for
*etaV*.

### Author(s)

Andrew Hart and Servet Mart<ed>nez

### References

Hart, A.G. and Mart<ed>nez, S. (2011)
Statistical testing of Chargaff's second parity rule in bacterial genome sequences.
*Stoch. Models* **27(2)**, 1–46.

### See Also

`chargaff0.test`

, `chargaff1.test`

,
`chargaff2.test`

, `ag.test`

,
`chargaff.gibbs.test`

### Examples

1 2 3 4 5 6 7 8 9 | ```
#Demonstration on real viral sequence
data(pieris)
agct.test(pieris)
#Simulate synthetic DNA sequence that does not exhibit Purine-Pyrimidine parity
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3),
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))
agct.test(seq)
``` |