# dinucleotideFrequencyTest: Pearson's chi-squared Test and G-tests for String Position... In Bioconductor/Biostrings: Efficient manipulation of biological strings

## Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

## Usage

 ```1 2``` ```dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"), simulate.p.value = FALSE, B = 2000) ```

## Arguments

 `x` A DNAStringSet or RNAStringSet object. `i, j` Single integer values for positions to test for dependence. `test` One of `"chisq"` (Person's chi-squared test), `"G"` (G-test), or `"adjG"` (William's corrected G-test). See Details section. `simulate.p.value` a logical indicating whether to compute p-values by Monte Carlo simulation. `B` an integer specifying the number of replicates used in the Monte Carlo test.

## Details

The null and alternative hypotheses for this function are:

H0:

positions `i` and `j` are independent

H1:

otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions `i` and `j` respectively. Then the test statistics are calculated as:

`test="chisq"`:

stat = sum(abs(O - E)^2/E)

`test="G"`:

stat = 2 * sum(O * log(O/E))

`test="adjG"`:

stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

## Value

An htest object. See help(chisq.test) for more details.

P. Aboyoun

## References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

## See Also

`nucleotideFrequencyAt`, XStringSet-class, `chisq.test`

## Examples

 ```1 2 3 4``` ``` data(HNF4alpha) dinucleotideFrequencyTest(HNF4alpha, 1, 2) dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "G") dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "adjG") ```

Bioconductor/Biostrings documentation built on Nov. 7, 2018, 2:33 p.m.