IMPACC: run IMPACC

View source: R/IMPACC.R

IMPACCR Documentation

run IMPACC

Description

IMPACC function for consensus matrix and feature importance scores, using adaptive observation and/or feature subsampling; MPCC function for consensus matrix, using random feature and subsampling; IMPACC_cluster function for calculating clustering results from consensus matrix;

Usage

IMPACC(d=NULL,
                  K=NULL,
                  adaptiveFeature = TRUE,
                  reps=300,
                  pItem=0.25,
                  pFeature=0.1,
                  innerLinkage="ward.D",
                  distance="manhattan",
                  h=0.95,
                  E= 3,
                  qI=0.95,
                  qF=1,
                  alpha_I=0.5,
                  alpha_F=0.5,
                  pp=0.05,
                  finalAlgorithm='hclust',
                  finalLinkage='ward.D',
                  early_stop=TRUE,
                  num_unchange = 5,
                  eps = 0.00001,
                  feature_evaluation = 'ANOVA',
                  seed=NULL,
                  verbose=TRUE)


MPCC(d=NULL,
                K = NULL,
                reps=300,
                pItem=0.25,
                pFeature=0.1,
                innerLinkage="ward.D",
                distance="manhattan",
                h=0.95,
                finalAlgorithm='hclust',
                finalLinkage='ward.D',
                early_stop=TRUE,
                num_unchange = 5,
                eps = 0.00001,
                seed=NULL,
              verbose=TRUE)


IMPACC_cluster(ConsensusMatrix=NULL,
              K=NULL,
              finalAlgorithm='hclust',
              finalLinkage='ward.D')

Arguments

d

data to be clustered. A data matrix where columns are observation and rows are features.

K

integer value. Number of clusters

adaptiveFeature

boolean. If TRUE, algorithm will adaptively select features.

reps

integer value. Maximum number of minipatches.

pItem

numerical value. Proportion of items to sample.

pFeature

numerical value. Proportion of features to sample.

innerLinkage

Heirarchical linkage method for minipatch.

distance

character value. 'pearson': (1 - Pearson correlation), 'spearman' (1 - Spearman correlation), 'euclidean', 'binary', 'maximum', 'canberra', 'minkowski" or custom distance function.

h

numerical value. quantile cutoff of dendrogram height.

E

interger value. the least number of times a feature is subsampled in burn-in stage

qI

numerical value. high uncertainty cutoff for observations

qF

numerical value. high importance cutoff for features

alpha_I

numerical value. learning rate for observation weight updates

alpha_F

numerical value. learning rate for feature weight updates

pp

numerical value. percentile cutoff for p-value of ANOVA test

early_stop

boolean. If TRUE, the algorithm will stop when the consensus matrix is stable.

eps

numeric value. Measure the change of stability of consensus matrix

num_unchange

integer value. Number of continous stable minipatches needed to stop the function.

feature_evaluation

character value. 'ANOVA': evaluate feature importance by ANOVA test, 'rankANOVA' evaluate feature importance by Kruskal-Wallis test, 'multinomial': evaluate feature importance by multinomial regression test.

seed

optional numerical value. sets random seed for reproducible results.

verbose

boolean. If TRUE, print messages to the screen to indicate progress. This is useful for large datasets.

ConsensusMatrix

result of IMPACC/MPCC.

finalAlgorithm

character value. cluster algorithm. 'hc' heirarchical (hclust), 'spec' for spectral clustering,'pam' for paritioning around medoids, 'km' for k-means upon data matrix, 'ap' for affinity propagation clustering, 'convex' for convex clustering or a function that returns a clustering. See example and vignette for more details.

finalLinkage

heirarchical linkage method for consensus matrix.

Details

IMPACC implements the Interpretable Minipatch Adaptive Consensus Clustering of Gan, L., & Allen, G. I. (2021). Its utility is to provide stable and robust clustering membership, interpretability in terms of feature importance in a time efficient manne. IMPACC takes a numerical data matrix of obervations as columns and features as rows. This function construct minipatches according to pItem, pFeature, and clusters the data by fitting a hierarchical clustering with ward.D linkage and manhanttan distance. And this function perform adaptive subsampling schemes according to E, qI, qF, alpha_I, alpha_F and pp.

MPCC implements the Minipatch Consensus Clustering of Gan, L., & Allen, G. I. (2021), which is based on random minipatches.

For a detailed description of usage, output and images, see the vignette by: openVignette().

Value

IMPACC returns a list containing ConsensusMatrix (numerical matrix), labels (consensus class asssignments), feature_importance (feature), and nIter (stopping point).

MPCC returns a list containing ConsensusMatrix (numerical matrix), labels (consensus class asssignments), and nIter (stopping point).

IMPACC_cluster returns a list of size N contaning clustering labels.

Author(s)

Luqin Gan luqin_gan@rice.edu Genevera I. Allen gallen@rice.edu

References

Gan, L., & Allen, G. I. (2021). Fast and Interpretable Consensus Clustering via Minipatch Learning. arXiv preprint arXiv:2110.02388.

Examples

data(yan)
impacc = IMPACC(d=yan$sc_cnt,K = K,reps = 100,verbose=FALSE)

DataSlingers/IMPACC documentation built on April 29, 2023, 8:16 p.m.