Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.
A DNAStringSet or RNAStringSet object.
Single integer values for positions to test for dependence.
a logical indicating whether to compute p-values by Monte Carlo simulation.
an integer specifying the number of replicates used in the Monte Carlo test.
The null and alternative hypotheses for this function are:
j are independent
Let O and E be the observed and expected probabilities for base pair
combinations at positions
j respectively. Then the
test statistics are calculated as:
stat = sum(abs(O - E)^2/E)
stat = 2 * sum(O * log(O/E))
stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))
Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).
An htest object. See help(chisq.test) for more details.
Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.
Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.
Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.
Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.
1 2 3 4