# diagnosticTest: Testing Procedure for Bicluster Diagnostics In biclust: BiCluster Algorithms

## Description

Calculate the statistical value of the row, column and multiplicative effect based on discovered biclusters in the data. Additionally multiple sampling methods are available to compute the statistical significance through p-values.

## Usage

 ```1 2 3``` ```diagnosticTest(BCresult, data, number = 1:BCresult@Number, verbose = TRUE, statistics = c("F", "Tukey"), sampling = TRUE, samplingtypes = NULL, nSim = 1000, alpha = 0.05, save_F = FALSE) ```

## Arguments

 `BCresult` An object of class `biclust` containing the result of a biclustering algorithm `data` data matrix, which `biclust` function was applied to `number` Vector of bicluster numbers of which the diagnostics should be calculated. (default = all available biclusters) `verbose` Boolean value to print progression of computed statistics. `statistics` Vector select which statistics to compute. (default = `c("F","Tukey")`) `"F"` (Row and column F statistics of two-way ANOVA with one replicate for cell) `"Tukey"` (Tukey's test for non-additivity) `"ModTukey"` (`mtukey.test`) `"Tusell"` (`tusell.test`) `"Mandel"` (`mandel.test`) `"LBI"` (`lbi.test`) `"JandG"` (`johnson.graybill.test`) `sampling` Boolean value to apply sampling methods to compute statistical significance (default=`TRUE`). If `FALSE` only the `"Theoretical"` p-values are computed. If `TRUE`, both the `"Theoretical"` and `samplingtypes` p-values are computed. `samplingtypes` Vector of sampling methods for `sampling=TRUE`. (default=`NULL`=`c("Permutation","SemiparPerm")`) `"Permutation"` `"SemiparPerm"` `"SemiparBoot"` `"PermutationCor"` `"SamplingCor"` `"NormSim"` See Details for more info. `nSim` Number of permutations/bootstraps. `alpha` Significance level (default=0.05) `save_F` Option to save the permuted/bootstraped statistics. This is necessary for `diagnosticPlot2`

## Details

Due to the uncertainty of discovering the true bicluster(s) in the data, it's often advisable to not rely on the theoretical p-values but instead retrieve the p-values through a sampling procedure.

Available p-values/sampling types for each statistical method:

• `"F"`: `"Theoretical"` and `"Permutation"` for both row and column effect.

• `"Tukey"`: `"Theoretical"`, `"SemiparPerm"` and `"SemiparBoot"`.

• `"ModTukey"`: `"Theoretical"`, `"SemiparPerm"`, `"SemiparBoot"`, `"PermutationCor"` and `"SamplingCor"`.

• `"Tusell"`: `"SemiparPerm"`, `"SemiparBoot"` and `"NormSim"`.

• `"Mandel"`: `"Theoretical"`, `"SemiparPerm"` and `"SemiparBoot"`.

• `"LBI"`: `"SemiparPerm"`, `"SemiparBoot"` and `"NormSim"`.

• `"JandG"`: `"SemiparPerm"`, `"SemiparBoot"` and `"NormSim"`.

More info on the sampling types can be found in the secion below. If available, the `"Theoretical"` will always be computed. By default when `sampling=TRUE`, a sampling method without replacement is chosen, namely `"Permutation"` and `"SemiparPerm"`.

When `save_F=TRUE`, the null distributions of the statistics can be visualised with `diagnosticPlot2`.

Disclaimer: While their functionality did not change, some functions of the `additivityTests` package were altered in order to be able to return the permuted/bootstrapped statistics and p-values.

## Value

Returns a list with `length(number)` elements. Each element corresponds with the requested biclusters and is a list containing:

• `table`: a data frame where each row is `statistics` and `samplingtypes` (including Theoretical) combination. The data frame contains the `Method`, `Type` (p-value type), `StatVal` (statistical value), `CritVal` (critical value), `pVal` and `Sign` (0/1 significance indicator based on `alpha`).

• `save_F`: if `save_F=TRUE`, a (`nSim` x number of permuted/bootstrapped p-values) matrix contained the sampled statistics.

## Sampling Types

For each sampling type a permuted/bootstrapped BC is created as following:

• `"Permutation"`: Sample a BC from the entire dataset with replacement.

• `"SemiparPerm"`: A semi-parametric permutation procedure. Two-way ANOVA is applied on the original BC and the residual matrix extracted. A new residual matrix is created by sampling without replacement from the original residual matrix. The sampled BC is then generated by adding this sampled residual matrix on top the mean, row and column effect of the ANOVA procedure of the original BC.

• `"SemiparBoot"`: A semi-parametric bootstrapping procedure. Two-way ANOVA is applied on the original BC and the residual matrix extracted. A new residual matrix is created by sampling with replacement from the original residual matrix. The sampled BC is then generated by adding this sampled residual matrix on top the mean, row and column effect of the ANOVA procedure of the original BC.

• `"PermutationCor"`: See `correction=1` parameter of `mtukey.test`. More info in Simecek and Simeckova (2012).

• `"SamplingCor"`: See `correction=2` parameter of `mtukey.test`. More info in Simecek and Simeckova (2012).

• `"NormSim"`: Sample a BC from a standard normal distribution. This sampling procedure is used for some methods in the `additivityTests` package.

Ewoud De Troyer

## References

Tukey, J.W.: One Degree of Freedom for Non-additivity, Biometrics 5, pp. 232-242, 1949.

Simecek, Petr, and Simeckova, Marie. "Modification of Tukey's additivity test." Journal of Statistical Planning and Inference, 2012.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23``` ```## Not run: #Random matrix with embedded bicluster (with multiplicative effect) test <- matrix(rnorm(5000),100,50) roweff <- sample(1:5,10,replace=TRUE) coleff <- sample(1:5,10,replace=TRUE) test[11:20,11:20] <- test[11:20,11:20] + matrix(coleff,nrow=10,ncol=10,byrow=TRUE) + matrix(roweff,nrow=10,ncol=10) + roweff %*% t(coleff) #Apply Plaid Biclustering res <- biclust(test, method=BCPlaid()) #Apply default diagnosticTest out <- diagnosticTest(BCresult=res, data=test, save_F=TRUE, number=1, statistics=c("F","Tukey","ModTukey","Tusell","Mandel","LBI","JandG"), samplingtypes=c("Permutation","SemiparPerm","SemiparBoot", "PermutationCor","SamplingCor","NormSim")) out[]\$table ## End(Not run) ```

biclust documentation built on May 25, 2021, 5:09 p.m.