MPLNClust | R Documentation |
MPLNClust
is an R package for performing clustering using mixtures of
multivariate Poisson-log normal (MPLN) distribution proposed by
Silva et al., 2019. It was developed for count data, with clustering of
RNA sequencing data as a motivation. However, the vector of normalization
factors can be relaxed and clustering method may be applied to other
types of count data.
The MPLNClust package provides 10 functions:
mplnVariational
runMPLNClust
mplnMCMCParallel
mplnMCMCNonParallel
mplnVisualize
mplnDataGenerator
AICFunction
BICFunction
AIC3Function
ICLFunction
For a quick introduction to MPLNClust see the vignettes.
Anjali Silva, anjal@alumni.uoguelph.ca, Sanjeena Dang, sanjeena.dang@carleton.ca.
Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link
Subedi, S., and R. Browne (2020). A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data. arXiv preprint arXiv:2004.06857. Link
Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, New York, NY, USA, pp. 267–281. Springer Verlag.
Arlot, S., Brault, V., Baudry, J., Maugis, C., and Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.
Biernacki, C., G. Celeux, and G. Govaert (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22.
Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 2 Multivariate Statistical Modeling, pp. 69–113. Dordrecht: Springer Netherlands.
Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6.
# Generating simulated data
trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)
trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)
sampleData <- MPLNClust::mplnDataGenerator(nObservations = 1000,
dimensionality = 6,
mixingProportions = c(0.79, 0.21),
mu = rbind(trueMu1, trueMu2),
sigma = rbind(trueSigma1, trueSigma2),
produceImage = "No")
# Clustering via mplnVariational
mplnResults <- MPLNClust::mplnVariational(dataset = sampleData$dataset,
membership = sampleData$trueMembership,
gmin = 1,
gmax = 2,
initMethod = "kmeans",
nInitIterations = 2,
normalize = "Yes")
## Not run:
# Clustering via mplnMCMCParallel
mplnResults <- MPLNClust::mplnMCMCParallel(dataset = sampleData$dataset,
membership = sampleData$trueMembership,
gmin = 1,
gmax = 1,
nChains = 3,
nIterations = 400,
initMethod = "kmeans",
nInitIterations = 0,
normalize = "Yes",
numNodes = 2)
# Clustering via mplnMCMCNonParallel
mplnResults <- MPLNClust::mplnMCMCNonParallel(dataset = sampleData$dataset,
membership = sampleData$trueMembership,
gmin = 1,
gmax = 1,
nChains = 3,
nIterations = 700,
initMethod = "kmeans",
nInitIterations = 0,
normalize = "Yes")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.