# sparsemdc_gap: Gap Statistic Calculator In SparseMDC: Implementation of SparseMDC Algorithm

## Description

This function calculates the gap statistic for SparseMDC. For use when the number of clusters in the data is unknown. We recommend using alternate methods to infer the number of clusters in the data.

## Usage

 ```1 2``` ```sparsemdc_gap(pdat, dim, min_clus, max_clus, nboots = 200, nitter = 20, nstarts = 10, l1_boot = 50, l2_boot = 50) ```

## Arguments

 `pdat` list with D entries, each entry contains data d, p * n matrix. This data should be centered and log-transformed. `dim` Total number of conditions, D. `min_clus` The minimum number of clusters to try, minimum value is 2. `max_clus` The maximum number of clusters to try. `nboots` The number of bootstrap repetitions to use, default = 200. `nitter` The max number of iterations for each of the start values, the default value is 20. `nstarts` The number of start values to use for SparseDC. The default value is 10. `l1_boot` The number of bootstrap repetitions used for estimating lambda 1. `l2_boot` The number of bootstrap repetitions used for estimating lambda 2.

## Value

A list containing the optimal number of clusters, as well as gap statistics and the calculated standard error for each number of clusters.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```set.seed(10) # Select small dataset for example data_test <- data_biase[1:100,] # Split data into condition A and B data_A <- data_test[ , which(condition_biase == "A")] data_B <- data_test[ , which(condition_biase == "B")] data_C <- data_test[ , which(condition_biase == "C")] # Store data as list dat_l <- list(data_A, data_B, data_C) # Pre-process the data pdat <- pre_proc_data(dat_l, dim=3, norm = FALSE, log = TRUE, center = TRUE) # Run with one bootstrap sample for example gap_stat <- sparsemdc_gap(pdat, dim=3, min_clus = 2, max_clus =3, nboots =2, nitter = 2, nstarts = 1, l1_boot = 5, l2_boot = 5) ```

SparseMDC documentation built on May 2, 2019, 4:01 a.m.