library(knitr)
opts_chunk$set(fig.align = 'center', fig.width = 6, fig.height = 5, dev = 'png')

Summary

IMPACC is a tool for unsupervised clustering and feature importance discovery. This document provides a tutorial of how to use IMPACC.

Brief description of IMPACC

IMPACC (Interpretable Minipatch Adaptive Consensus Clustering) is a powerful methodology for consensus clustering using minipatch learning with adaptive feature and observation sampling schemes. IMPACC offers interpretable results by discovering features that differentiate clusters. This method is particularly applicable to sparse, high-dimensional data sets common in bioinformatics. MPCC (MiniPatch Consensus Clustering) provides consensus clustering by subsampling a tiny fraction of both observations and features at each iteration.

IMPACC\cite{} implements the IMPACC and MPCC method in R.

Tutorial

Data Input

The input data format needs to be a matrix where columns are observations, rows are features, and cells are numerical values. If your input data is single-cell RNA-seq data, it needs to be normalized to achieve optimal results.

library(IMPACC)
data(yan)
yan$raw[1:3, 1:3]
yan$sc_cnt[1:3, 1:3]
head(yan$sc_label)
K = length(unique(yan$sc_label))

The yan list contains three components: raw is gene expression count matrix before log2 transformation; sc_cnt is the log2 transformed gene expression matrix; and sc_label correspond to the cell labels provided by authors of the original publication.

Run IMPACC

impacc = IMPACC(d=yan$sc_cnt,K = K,reps = 100,verbose=FALSE)

IMPACC returns a list containing ConsensusMatrix (numerical matrix), labels (consensus class asssignments), feature_importance (feature), and nIter (stopping point).

Users can run MPACC (Minipatch Adaptive Consensus Clustering) by simply setting adaptiveFeature as FALSE.

Run MPACC with adaptive observation subsampling and random feature subsampling.

mpacc = IMPACC(d=yan$sc_cnt,K = K,adaptiveFeature = FALSE,verbose=FALSE)

Run MPCC with random minipatch subsampling.

mpcc = MPCC(d=yan$sc_cnt,K = K,verbose=FALSE)

Run IMPACC with multinomial feature evaluation

impacc_multinomial = IMPACC(d=yan$sc_cnt,K = K,reps = 1,feature_evaluation = 'multinomial', verbose=FALSE)

\end{document}



DataSlingers/IMPACC documentation built on April 29, 2023, 8:16 p.m.