dinucleotideFrequencyTest: Pearson's chi-squared Test and G-tests for String Position...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

Usage

1
2
dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"),
                          simulate.p.value = FALSE, B = 2000)

Arguments

x

A DNAStringSet or RNAStringSet object.

i, j

Single integer values for positions to test for dependence.

test

One of "chisq" (Person's chi-squared test), "G" (G-test), or "adjG" (William's corrected G-test). See Details section.

simulate.p.value

a logical indicating whether to compute p-values by Monte Carlo simulation.

B

an integer specifying the number of replicates used in the Monte Carlo test.

Details

The null and alternative hypotheses for this function are:

H0:

positions i and j are independent

H1:

otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions i and j respectively. Then the test statistics are calculated as:

test="chisq":

stat = sum(abs(O - E)^2/E)

test="G":

stat = 2 * sum(O * log(O/E))

test="adjG":

stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

Value

An htest object. See help(chisq.test) for more details.

Author(s)

P. Aboyoun

References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

See Also

nucleotideFrequencyAt, XStringSet-class, chisq.test

Examples

1
2
3
4

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package:BiostringsThe following object is masked frompackage:base:

    strsplit


	Pearson's Chi-squared test of independence

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
X-squared = 19.073, df = 9, p-value = 0.02458


	Log likelihood ratio (G-test) test of independence without correction

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
Log likelihood ratio statistic (G) = 17.261, X-squared df = 9, p-value
= 0.04478


	Log likelihood ratio (G-test) test of independence with Williams'
	correction

data:  nucleotideFrequencyAt(HNF4alpha, c(1, 2))
Log likelihood ratio statistic (G) = 10.806, X-squared df = 9, p-value
= 0.2892

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.