cluster_parameters: Detection of algorithm and number of clusters

Description Usage Arguments Value Author(s) References Examples

View source: R/cluster_parameters.R

Description

Detection of appropriate clustering algorithm and cluster number for given data using "clvalid" and "NbClust" in background for cluster validation.

Usage

1
2
3
4
5
cluster_parameters(name, comparisonAlgorithm = "clValid",
optimal = FALSE, n = 2:6,
clusteringMethods = c("kmeans", "pam"),
validationMethods = c("internal", "stability"),
distance = "euclidean", ...)

Arguments

name

dataframe returned by "extract()".

optimal

logical. If TRUE, returns a dataframe of optimal results

n

a vector of numbers corresponding to the number of clusters to be tested or validated

comparisonAlgorithm

2 choices available: "clValid" or "NbClust"

clusteringMethods

a vector of single or multiple names of clustering algorithms. available choices are:

1) if comparisonAlgorithm = "clValid" : "hierarchical", "kmeans", "diana", "fanny", "som", "model", "sota", "pam", "clara" and "agnes"

2) if comparisonAlgorithm = "NbClust" : "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid", "kmeans".

validationMethods

name of the method to validate clusters. Available options (one or more):

1) if comparisonAlgorithm = "clValid" then one of:

"kl", "ch", "hartigan", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "all" (all indices except GAP, Gamma, Gplus and Tau), "alllong" (all indices with Gap, Gamma, Gplus and Tau included).

2) if comparisonAlgorithm = "NbClust" then one or more of:

"internal", "stability", and "biological"

distance

metric used to calculate distance matrix. options:

1) for "clValid" :

"euclidean", "correlation", and "manhattan".

2) for "NbClust" :

This must be one of: "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"

...

additional non-conflicting arguments to "clValid" or "Nbclust"

Value

1) for "clValid"

an object of class "clValid" (optimal = FALSE) or a dataframe of optimal values (optimal = TRUE)

2) for "NbClust"

a list of :

All.index, All.CriticalValues, Best.nc and Best.partition.

See the help pages of "clValid" (?clValid) and "NbClust" (?NbClust) for more details.

Author(s)

Subhadeep Das

References

Brock, G., Pihur, V., Datta, S. and Datta, S. (2008) clValid: An R Package for Cluster Validation Journal of Statistical Software 25(4) http://www.jstatsoft.org/v25/i04

Charrad M., Ghazzali N., Boiteau V., Niknafs A. (2014). "NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set.", "Journal of Statistical Software, 61(6), 1-36.", "URL http://www.jstatsoft.org/v61/i06/".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
exclude <- list(0,c(1,9))

int_PCA <- integrate_pca(Assays = c("H2az",
"H3k9ac"),
groupinfo = groupinfo,
name = multi_assay, mergetype = 2,
exclude = exclude, graph = FALSE)

name = int_PCA$int_PCA

data <- extract(name = name, PC = c(1:4),
groups = c("WE","RE"), integrated = TRUE, rand = 600,
groupinfo = groupinfo_ext)

#### Using "clValid" ####

clusterstats <- cluster_parameters(name = data,
optimal = FALSE, n = 2:4, comparisonAlgorithm = "clValid",
distance = "euclidean", clusteringMethods = c("kmeans"),
validationMethods = c("internal"))

OMICsPCA documentation built on Nov. 8, 2020, 5:01 p.m.