dinucleotideFrequencyTest: Pearson's chi-squared Test and G-tests for String Position...

dinucleotideFrequencyTestR Documentation

Pearson's chi-squared Test and G-tests for String Position Dependence

Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

Usage

dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"),
                          simulate.p.value = FALSE, B = 2000)

Arguments

x

A DNAStringSet or RNAStringSet object.

i, j

Single integer values for positions to test for dependence.

test

One of "chisq" (Person's chi-squared test), "G" (G-test), or "adjG" (William's corrected G-test). See Details section.

simulate.p.value

a logical indicating whether to compute p-values by Monte Carlo simulation.

B

an integer specifying the number of replicates used in the Monte Carlo test.

Details

The null and alternative hypotheses for this function are:

H0:

positions i and j are independent

H1:

otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions i and j respectively. Then the test statistics are calculated as:

test="chisq":

stat = sum(abs(O - E)^2/E)

test="G":

stat = 2 * sum(O * log(O/E))

test="adjG":

stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

Value

An htest object. See help(chisq.test) for more details.

Author(s)

P. Aboyoun

References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

See Also

nucleotideFrequencyAt, XStringSet-class, chisq.test

Examples

  data(HNF4alpha)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "G")
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "adjG")

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.