agct.test: Test of Purine-Pyrimidine Parity Based on Euclidean distance

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/agct.test.R


Performs a test proposed by Hart and Mart<ed>nez (2011) for the equivalence of the relative frequencies of purines (A+G) and pyrimidines (C+T) in DNA sequences. It does this by checking whether or not the mononucleotide frequencies of a DNA sequence satisfy the relationship A+G=C+T.


agct.test(x, alg=c("exact", "simulate", "lower", "Lower", "upper"), n)



either a vector containing the relative frequencies of each of the 4 nucleotides A, C, G, T, a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package.


the algorithm for computing the p-value. If set to “simulate”, the p-value is obtained via Monte Carlo simulation. If set to “lower”, an analytic lower bound on the p-value is computed. If set to “upper”, an analytic upper bound on the p-value is computed. “lower” and “upper” are based on formulae in Hart and Mart<ed>nez (2011). a Tighter (though unpublished) lower bound on the p-value may be obtained by specifying “Lower”. If alg is specified as “exact” (the default value), the p-value for the test is computed exactly.


The number of replications to use for Monte Carlo simulation. If computationally feasible, a value >= 10000000 is recommended.


The first argument may be a character vector representing a DNA sequence, a DNA sequence represented using the SeqFastadna class from the seqinr package, or a vector containing the relative frequencies of the A, C, G and T nucleic acids.

Let A, C, G and T denote the relative frequencies of the nucleotide bases appearing in a DNA sequence. This function carries out a statistical hypothesis test that the relative frequencies satisfy the relation A+G=C+T, or that purines {A,G} occur equally as often as pyrimidines {C,T} in a DNA sequence. The relationship can be rewritten as A-T=C-G, from which it is easy to see that the property being tested is a generalisation of Chargaff's second parity rule for mononucleotides, which states that A=T and C=G. The test is set up as follows:

H0: A+G != C+T
H1: A+G = C+T

The vector (A,C,G,T) is assumed to come from a Dirichlet(1,1,1,1) distribution on the 3-simplex under the null hypothesis.

The test statistic etaV is the Euclidean distance from the relative frequency vector (A,C,G,T) to the closest point in the square set thetaV = {(x,y,1/2-x,1/2- y) : 0 <= x,y <= 1/2}, which divides the 3-simplex into two equal parts. etaV lies in the range [0,sqrt(3/8)].


A list with class "htest.ext" containing the following components:


the value of the test statistic.


the p-value of the test.


a character string indicating what type of test was performed.

a character string giving the name of the data.


the probability vector used to derive the test statistic.


a brief description of the test statistic.


the null hypothesis (H0) of the test.


the alternative hypothesis (H1) of the test.


agct.test(x, alg="upper") is equivalent to ag.test(x, alg="simplex") except that the p-value computed using the formula for alg="upper" is exact for the test statistic etaV* used in ag.test, whereas it is merely an upper bound on the p-value for etaV.


Andrew Hart and Servet Mart<ed>nez


Hart, A.G. and Mart<ed>nez, S. (2011) Statistical testing of Chargaff's second parity rule in bacterial genome sequences. Stoch. Models 27(2), 1–46.

See Also

chargaff0.test, chargaff1.test, chargaff2.test, ag.test, chargaff.gibbs.test


#Demonstration on real viral sequence

#Simulate synthetic DNA sequence that does not exhibit Purine-Pyrimidine parity
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3), 
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))

spgs documentation built on May 19, 2017, 7:40 a.m.

Search within the spgs package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at

Please suggest features or report bugs in the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.