# dinucleotideFrequencyTest: Pearson's chi-squared Test and G-tests for String Position... In Biostrings: Efficient manipulation of biological strings

## Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

## Usage

 ```1 2``` ```dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"), simulate.p.value = FALSE, B = 2000) ```

## Arguments

 `x` A DNAStringSet or RNAStringSet object. `i, j` Single integer values for positions to test for dependence. `test` One of `"chisq"` (Person's chi-squared test), `"G"` (G-test), or `"adjG"` (William's corrected G-test). See Details section. `simulate.p.value` a logical indicating whether to compute p-values by Monte Carlo simulation. `B` an integer specifying the number of replicates used in the Monte Carlo test.

## Details

The null and alternative hypotheses for this function are:

H0:

positions `i` and `j` are independent

H1:

otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions `i` and `j` respectively. Then the test statistics are calculated as:

`test="chisq"`:

stat = sum(abs(O - E)^2/E)

`test="G"`:

stat = 2 * sum(O * log(O/E))

`test="adjG"`:

stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

## Value

An htest object. See help(chisq.test) for more details.

P. Aboyoun

## References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

`nucleotideFrequencyAt`, XStringSet-class, `chisq.test`

## Examples

 ```1 2 3 4``` ``` data(HNF4alpha) dinucleotideFrequencyTest(HNF4alpha, 1, 2) dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "G") dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "adjG") ```

### Example output

```Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

Pearson's Chi-squared test of independence

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
X-squared = 19.073, df = 9, p-value = 0.02458

Log likelihood ratio (G-test) test of independence without correction

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
Log likelihood ratio statistic (G) = 17.261, X-squared df = 9, p-value
= 0.04478

Log likelihood ratio (G-test) test of independence with Williams'
correction

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
Log likelihood ratio statistic (G) = 10.806, X-squared df = 9, p-value
= 0.2892
```

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.