BossaClust: Bossa Clustering

Description Usage Arguments Value Examples

View source: R/Bossa_Clust.R

Description

With the previous calculated similarity matrix or the original categorical dataframe, the results of both overlap clustering and hierarchical clustering are obtained with several recommended cluster numbers(k) after processing the merge cluster step.

Usage

1
2
3
BossaClust(data, data.pre = NULL, alpha = 1, p = c(0.9, 0.75, 0.5),
  lin = 0.25, is.pca = TRUE, pca.sum.prop = 0.95, n.comp = 50,
  fix.pca.comp = FALSE, cri = 1, lintype = "ward.D2", perplexity = 30)

Arguments

data

an original categorical data with n observations and p variables.

data.pre

an list obtained by BossaSimi including original categorical data, similarity matrix, dissimilarity matrix and transformed data, Bossa scores. It is recommended to calculate the data.pre first and then do BossaClust in order to save time when trying to change parameters of this function.

alpha

A power scaling for Bossa scores, representing the weight of variable sigma value.

p

A set of quantiles(90 similarity matrix to form clusters at different levels of within-cluster similarity.

lin

A tuning parameter to control the size of each overlap cluster before merging, smaller lin leads to larger cluster size.

is.pca

A logical variable indicating if the Bossa scores should transformed to principle components and then calculate the similarity matrix. It is recommended when processing the ultra-dimension data.

pca.sum.prop

A numeric indicating how many components should be reserved in order to make this proportion of variance. The default is pca.sum.prop = 0.95.

n.comp

The number of components of PCA. The default is n.comp = 50.

fix.pca.comp

A numeric variable indicating whether choosing the fixed number of components or the fixed proportion of variance and the default is to choose fixed proportion.

cri

A tuning parameter, if p value smaller than cri, then reject the NULL hypothesis and merge overlap sub-clusters. And cri can be any numeric less than 1, if cri = 1 then the criteria will be reset to 0.05/N (N is the number of all overlap sub-clusters), and if cri = 2 then the criteria 0.05/N(N-1).

lintype

The agglomeration method to be used in hclust. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" and so on. The default is "ward.D2".

perplexity

A parameter of tsne

Value

An object including overlap clusters after merging and non-overlap clusters, which can be showed by function bossa_interactive

Examples

1
2
3
4

boclust documentation built on Dec. 4, 2017, 9:04 a.m.