library(knitr) opts_chunk$set(fig.align = "center", out.width = "90%", fig.width = 6, fig.height = 5.5, dev.args=list(pointsize=10), par = TRUE, # needed for setting hook collapse = TRUE, # collapse input & ouput code in chunks warning = FALSE) knit_hooks$set(par = function(before, options, envir) { if(before && options$fig.show != "none") par(family = "sans", mar=c(4.1,4.1,1.1,1.1), mgp=c(3,1,0), tcl=-0.5) }) set.seed(1) # for exact reproducibility
MPLNClust is an R package for model-based clustering based on finite multivariate Poisson-log normal mixture modelling proposed by Silva et al., 2019. It provides functions for parameter estimation via 1) an MCMC-EM framework by Silva et al., 2019 and 2) a variational Gaussian approximation with EM algorithm by Subedi and Browne, 2020. Information criteria (AIC, BIC, AIC3 and ICL) and slope heuristics (Djump and DDSE, if more than 10 models are considered) are offered for model selection. Also included is a function for simulating data from this model. An additional functionality is available for displaying and visualizing clustering results. This document gives a tour of MPLNClust (version 0.1.0) functionalities, here looking at methods of parameter estimation via 1) an MCMC-EM framework. It was written in R Markdown, using the knitr package for production. For MPLNClust (version 0.1.0) functionalities via 2) a variational Gaussian approximation with EM algorithm by Subedi and Browne, 2020, see the other vignette: A tour of MPLNClust with variational-EM.
See help(package = "MPLNClust")
for further details and references provided by citation("MPLNClust")
. To download MPLNClust, use the following commands:
require("devtools") install_github("anjalisilva/MPLNClust", build_vignettes = TRUE) library("MPLNClust")
To list all functions available in the package:
ls("package:MPLNClust")
The function mplnDataGenerator permits to simulate data from a mixture of MPLN distributions. See ?mplnDataGenerator for more information, an example, and references. To simulate a dataset from a mixture of MPLN with 100 observations and a dimensionality of 6, with two components, each with a mixing proportion of 0.79 and 0.21, respectively, let us use mplnDataGenerator. This also requires the mean and covariance matrix for each component, respectively.
nObservations <- 100 # Samples e.g., genes dimensionality <- 6 # Dimensionality e.g., conditions * replicates = total samples pig <- c(0.79, 0.21) # Mixing proportions for two components # Generate means trueMu1 <- c(6.5, 6, 6, 5, 5, 5) # Mean for component 1 trueMu2 <- c(2, 2.5, 2, 2, 2, 2) # Mean for component 2 trueMus <- rbind(trueMu1, trueMu2) # Generate covariances library(clusterGeneration) set.seed(1) # Covariance for component 1 trueSigma1 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = dimensionality, rangeVar = c(1, 1.5))$Sigma # Covariance for component 2 trueSigma2 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = dimensionality, rangeVar = c(0.7, 0.7))$Sigma trueSigma <- rbind(trueSigma1, trueSigma2) # Generate data sampleData <- MPLNClust::mplnDataGenerator( nObservations = nObservations, dimensionality = dimensionality, mixingProportions = pig, mu = trueMus, sigma = trueSigma, produceImage = "Yes")
The user has the option to produce the plot of log-transformed count data.
The generated dataset can be checked:
dim(sampleData$dataset) # 100 x 6 dataset class(sampleData$dataset) # matrix typeof(sampleData$dataset) # integer summary(sampleData$dataset) # summary of data pairs(sampleData$dataset, col = sampleData$trueMembership + 1, main = "Pairs plot of counts") # visualize counts
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.