mplnMCMCNonParallel: Clustering Using MPLN With MCMC-EM Via Non-Parallel...
In anjalisilva/MPLNClust: Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

mplnMCMCNonParallel

R Documentation

Clustering Using MPLN With MCMC-EM Via Non-Parallel Performance

Description

Performs clustering using mixtures of multivariate Poisson-log normal (MPLN) distribution with Markov chain Monte Carlo expectation-maximization algorithm (MCMC-EM) for parameter estimation. No internal parallelization, thus code is run in serial. Model selection is performed using AIC, AIC3, BIC and ICL.

Usage

mplnMCMCNonParallel(
  dataset,
  membership = "none",
  gmin = 1,
  gmax = 2,
  nChains = 3,
  nIterations = 1000,
  initMethod = "kmeans",
  nInitIterations = 0,
  normalize = "Yes"
)

Arguments

`dataset`	A dataset of class matrix and type integer such that rows correspond to observations and columns correspond to variables. The dataset have dimensions n x d, where n is the total number of observations and d is the dimensionality. If rowSums are zero, these rows will be removed prior to cluster analysis.
`membership`	A numeric vector of length nrow(dataset) containing the cluster membership of each observation. If not available, leave as "none".
`gmin`	A positive integer specifying the minimum number of components to be considered in the clustering run.
`gmax`	A positive integer, >= gmin, specifying the maximum number of components to be considered in the clustering run.
`nChains`	A positive integer specifying the number of Markov chains. Default is 3, the recommended minimum number.
`nIterations`	A positive integer specifying the number of iterations for each chain (including warmup). The value should be greater than 40. The upper limit will depend on size of dataset.
`initMethod`	An algorithm for initialization. Current options are "kmeans", "random", "medoids", "clara", or "fanny". Default is "kmeans"
`nInitIterations`	A positive integer or zero, specifying the number of initialization runs to be performed. This many runs, each with 10 iterations, will be performed via MPLNClust and values from the run with highest log-likelihood will be used as initialization values. Default is 0.
`normalize`	A string with options "Yes" or "No" specifying if normalization should be performed. Currently, normalization factors are calculated using TMM method of edgeR package. Default is "Yes".

Value

Returns an S3 object of class mplnMCMCNonParallel with results.

dataset - The input dataset on which clustering is performed.
dimensionality - Dimensionality of the input dataset.
normalizationFactors - A vector of normalization factors used for input dataset.
gmin - Minimum number of components considered in the clustering run
gmax - Maximum number of components considered in the clustering run
initalizationMethod - Method used for initialization.
allResults - A list with all results.
logLikelihood - A vector with value of final log-likelihoods for each cluster size.
numbParameters - A vector with number of parameters for each cluster size.
trueLabels - The vector of true labels, if provided by user.
ICLresults - A list with all ICL model selection results.
BICresults - A list with all BIC model selection results.
AICresults - A list with all AIC model selection results.
AIC3results - A list with all AIC3 model selection results.
slopeHeuristics - If more than 10 models are considered, slope heuristic results as obtained via capushe::capushe().
DjumpModelSelected - If more than 10 models are considered, slope heuristic results as obtained via capushe::capushe().
DDSEModelSelected - If more than 10 models are considered, slope heuristic results as obtained via capushe::capushe().
totalTime - Total time used for clustering and model selection.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca, Sanjeena Dang, sanjeenadang@cunet.carleton.ca.

References

Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, New York, NY, USA, pp. 267–281. Springer Verlag.

Arlot, S., Brault, V., Baudry, J., Maugis, C., and Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.

Biernacki, C., G. Celeux, and G. Govaert (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22.

Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 2 Multivariate Statistical Modeling, pp. 69–113. Dordrecht: Springer Netherlands.

Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6.

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link

Examples

## Not run: 
trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

# Generating simulated data
sampleData <- MPLNClust::mplnDataGenerator(nObservations = 40,
                                 dimensionality = 6,
                                 mixingProportions = c(0.79, 0.21),
                                 mu = rbind(trueMu1, trueMu2),
                                 sigma = rbind(trueSigma1, trueSigma2),
                                 produceImage = "No")

# Clustering
mplnResults <- MPLNClust::mplnMCMCNonParallel(dataset = sampleData$dataset,
                                               membership = sampleData$trueMembership,
                                               gmin = 1,
                                               gmax = 1,
                                               nChains = 3,
                                               nIterations = 700,
                                               initMethod = "kmeans",
                                               nInitIterations = 0,
                                               normalize = "Yes")

## End(Not run)

anjalisilva/MPLNClust documentation built on Sept. 19, 2024, 7:34 a.m.

anjalisilva/MPLNClust index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

anjalisilva/MPLNClust
Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

mplnMCMCNonParallel: Clustering Using MPLN With MCMC-EM Via Non-Parallel...
In anjalisilva/MPLNClust: Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

Clustering Using MPLN With MCMC-EM Via Non-Parallel Performance

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to mplnMCMCNonParallel in anjalisilva/MPLNClust...

R Package Documentation

Browse R Packages

We want your feedback!

anjalisilva/MPLNClust Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

mplnMCMCNonParallel: Clustering Using MPLN With MCMC-EM Via Non-Parallel... In anjalisilva/MPLNClust: Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

Clustering Using MPLN With MCMC-EM Via Non-Parallel Performance

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to mplnMCMCNonParallel in anjalisilva/MPLNClust...

R Package Documentation

Browse R Packages

We want your feedback!

anjalisilva/MPLNClust
Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data

mplnMCMCNonParallel: Clustering Using MPLN With MCMC-EM Via Non-Parallel...
In anjalisilva/MPLNClust: Mixtures of Multivariate Poisson-Log Normal Model for Clustering Count Data