MPLNClust: MPLNClust: Clustering via mixtures of multivariate...

MPLNClustR Documentation

MPLNClust: Clustering via mixtures of multivariate Poisson-log normal distribution

Description

MPLNClust is an R package for performing clustering using mixtures of multivariate Poisson-log normal (MPLN) distribution proposed by Silva et al., 2019. It was developed for count data, with clustering of RNA sequencing data as a motivation. However, the vector of normalization factors can be relaxed and clustering method may be applied to other types of count data.

MPLNClust functions

The MPLNClust package provides 10 functions:

  • mplnVariational

  • runMPLNClust

  • mplnMCMCParallel

  • mplnMCMCNonParallel

  • mplnVisualize

  • mplnDataGenerator

  • AICFunction

  • BICFunction

  • AIC3Function

  • ICLFunction

For a quick introduction to MPLNClust see the vignettes.

Author(s)

Anjali Silva, anjal@alumni.uoguelph.ca, Sanjeena Dang, sanjeena.dang@carleton.ca.

References

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link

Subedi, S., and R. Browne (2020). A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data. arXiv preprint arXiv:2004.06857. Link

Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, New York, NY, USA, pp. 267–281. Springer Verlag.

Arlot, S., Brault, V., Baudry, J., Maugis, C., and Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.

Biernacki, C., G. Celeux, and G. Govaert (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22.

Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 2 Multivariate Statistical Modeling, pp. 69–113. Dordrecht: Springer Netherlands.

Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6.

Examples

# Generating simulated data

trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

sampleData <- MPLNClust::mplnDataGenerator(nObservations = 1000,
                                            dimensionality = 6,
                                            mixingProportions = c(0.79, 0.21),
                                            mu = rbind(trueMu1, trueMu2),
                                            sigma = rbind(trueSigma1, trueSigma2),
                                            produceImage = "No")

# Clustering via mplnVariational
mplnResults <- MPLNClust::mplnVariational(dataset = sampleData$dataset,
                                          membership = sampleData$trueMembership,
                                          gmin = 1,
                                          gmax = 2,
                                          initMethod = "kmeans",
                                          nInitIterations = 2,
                                          normalize = "Yes")
## Not run: 
# Clustering via mplnMCMCParallel
mplnResults <- MPLNClust::mplnMCMCParallel(dataset = sampleData$dataset,
                                             membership = sampleData$trueMembership,
                                             gmin = 1,
                                             gmax = 1,
                                             nChains = 3,
                                             nIterations = 400,
                                             initMethod = "kmeans",
                                             nInitIterations = 0,
                                             normalize = "Yes",
                                             numNodes = 2)

# Clustering via mplnMCMCNonParallel
mplnResults <- MPLNClust::mplnMCMCNonParallel(dataset = sampleData$dataset,
                                               membership = sampleData$trueMembership,
                                               gmin = 1,
                                               gmax = 1,
                                               nChains = 3,
                                               nIterations = 700,
                                               initMethod = "kmeans",
                                               nInitIterations = 0,
                                               normalize = "Yes")

## End(Not run)

anjalisilva/MPLNClust documentation built on Sept. 19, 2024, 7:34 a.m.