diid.test: A Test for a Bernoulli Scheme (IID Sequence)
In spgs: Statistical Patterns in Genomic Sequences

diid.test

R Documentation

A Test for a Bernoulli Scheme (IID Sequence)

Description

Tests whether or not a data series constitutes a Bernoulli scheme, that is, an independent and identically distributed (IID) sequence of symbols, by inferring the sequence of IID U(0,1) random noise that might have generated it.

Usage

diid.test(x, type = c("lb", "ks"), method = "holm", lag = 20, ...)

Arguments

`x`	the data series as a vector.
`type`	the procedures to use to test whether or not the noise series is independently and identically distributed on the unit interval. See ‘Details’.
`method`	the correction method to be used for adjusting the p-values. It is identical to the `method` argument of the `p.adjust` function, which is called to adjust the p-values.
`lag`	the number of lags to use when applying the Ljung-Box (portmanteau) test (`lb.test`).
`...`	parameters to pass on to functions that can be subsequently called.

Details

This function tests if a symbolic sequence is a Bernoulli scheme, that is, independently and identically distributed (IID). It does this by reverse- engineering the sequence to obtain a sample of the kind of output from a pseudo- random number generator that would have produced the observed sequence if it had been generated by simulating an IID sequence. The sample output is then tested to see if it is an independent and identically distributed siequence of uniform numbers in the range 0-1. this involves the application of at least two tests, one for independence and another for uniformity over the unit interval. One concludes that the sequence is IID if the sample output passes the tests (that is, all null hypotheses are accepted) and not IID otherwise.

The test is set up as follows:

H_0: the sequence is IID
H_1: the sequence is not IID

To simplify the use of the test, correction for multiple testing is carried out, which yields a single adjusted p- value. If this p-value is less than the significance level established for the test procedure, the null hypothesis of Markovianness is rejected. Otherwise, the null hypothesis should be accepted.

To correctly apply the test, use the type argument to specify at least one test of independence and one test of uniformity from the options displayed in the following table.

Category	Function	Test
Uniformity	`ks.unif.test`	Kolmogorov-Smirnov test for uniform$(0,1)$ data
	`chisq.unif.test`	Pearson's chi-squared test for discrete uniform data,
Independence	`lb.test`	Ljung-Box $Q$ test for uncorrelated data
	`diffsign.test`	signed difference test of independence
	`turningpoint.test`	turning point test of independence
	`rank.test`	rank test of independence

If type is not specified, lb.test and ks.unif.test are used by default.

As this procedure performs multiple tests in order to assess if the sequence is IID, it is necessary to adjust the p-values for multiple testing. By default, the Holm-Bonferroni method (holm) is used to correct for multiple testing, but this can be overridden via the method argument. The adjusted p-values are displayed when the result of the test is printed.

The smallest adjusted p-value constitutes the overall p-value for the test. If this p-value is less than the significance level fixed for the test procedure, the null hypothesis of the sequence beingIID is rejected. Otherwise, the null hypothesis should be accepted.

Value

A list with class "multiplehtest" containing the following components:

`method`	the character string “Composite test for a Bernoulli process”.
`statistics`	the values of the test statistic for all the tests.
`parameters`	parameters for all the tests. Exactly one parameter is recorded for each test, for example, `df` for `lb.test`. Any additional parameters are not saved, for example, the `a` and `b` parameters of `chisq.unif.test`.
`p.values`	p-values of all the tests.
`methods`	a vector of character strings indicating what type of tests were performed.
`adjusted.p.values`	the adjusted p-values.
`data.name`	a character string giving the name of the data.
`adjust.method`	indicates which correction method was used to adjust the p-values for multiple testing.
`estimate`	the transition matrix estimated to fit a first-order Markov chain to the data and used to generate the infered random disturbance

Note

Sometimes, a warning message advising that ties should not be present for the Kolmogorov-Smirnov test can arise when analysing long sequences. If you do receive this warning, it means that the results of the Kolmogorov-Smirnov test (ks.unif.test) should not be trusted. In this case, Pearson's chi-squared test (chisq.unif.test) should be used instead of the Kolmogorov-Smirnov test.

Author(s)

Andrew Hart and Servet Martínez

References

Although This test procedure is unpublished, it is derived by making appropriate modifications to the test for first-order Markovianness described in the following two references.

Hart, A.G. and Martínez, S. (2011) Statistical testing of Chargaff's second parity rule in bacterial genome sequences. Stoch. Models 27(2), 1–46.

Hart, A.G. and Martínez, S. (2014) Markovianness and Conditional Independence in Annotated Bacterial DNA. Stat. Appl. Genet. Mol. Biol. 13(6), 693-716. arXiv:1311.4411 [q-bio.QM].

Examples

#Generate an IID uniform DNA sequence
seq <- simulateMarkovChain(5000, matrix(0.25, 4, 4), states=c("a","c","g","t"))
diid.test(seq)

spgs documentation built on Oct. 3, 2023, 5:07 p.m.

spgs index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

spgs
Statistical Patterns in Genomic Sequences

diid.test: A Test for a Bernoulli Scheme (IID Sequence)
In spgs: Statistical Patterns in Genomic Sequences

A Test for a Bernoulli Scheme (IID Sequence)

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to diid.test in spgs...

R Package Documentation

Browse R Packages

We want your feedback!

spgs Statistical Patterns in Genomic Sequences

diid.test: A Test for a Bernoulli Scheme (IID Sequence) In spgs: Statistical Patterns in Genomic Sequences

A Test for a Bernoulli Scheme (IID Sequence)

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to diid.test in spgs...

R Package Documentation

Browse R Packages

We want your feedback!

spgs
Statistical Patterns in Genomic Sequences

diid.test: A Test for a Bernoulli Scheme (IID Sequence)
In spgs: Statistical Patterns in Genomic Sequences