mvplnVisualize: Visualize Clustered Results Via MVPLN

View source: R/VisualizationFunctions.R

mvplnVisualizeR Documentation

Visualize Clustered Results Via MVPLN

Description

A function to visualize data and clustering results obtained from a mixtures of matrix variate Poisson-log normal (MVPLN) model. Provided a matrix of probabilities for the observations belonging to each cluster, a barplot of probabilities is produced.

Usage

mvplnVisualize(
  dataset,
  plots = "bar",
  probabilities = NA,
  clusterMembershipVector = NA,
  fileName = paste0("Plot_", date()),
  printPlot = TRUE,
  format = "pdf"
)

Arguments

dataset

A dataset of class matrix and type integer such that rows correspond to observations and columns correspond to variables.

plots

A character string indicating which plots to be produced. Options are 'bar' only for now.

probabilities

A matrix of size N x C, such that rows correspond to N observations and columns correspond to C clusters. Each row should sum to 1. Default is NA.

clusterMembershipVector

A numeric vector of length nrow(dataset) containing the cluster membership of each observation as generated by mpln(). Default is NA.

fileName

Unique character string indicating the name for the plot being generated. Default is Plot_date, where date is obtained from date().

printPlot

Logical indicating if plot(s) should be saved in local directory. Default TRUE. Options TRUE or FALSE.

format

Character string indicating the format of the image to be produced. Default 'pdf'. Options 'pdf' or 'png'.

Value

Plotting function provides the possibility for a bar plot.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca

References

Aitchison, J. and C. H. Ho (1989). The multivariate Poisson-log normal distribution. Biometrika 76.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, New York, NY, USA, pp. 267–281. Springer Verlag.

Arlot, S., Brault, V., Baudry, J., Maugis, C., and Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.

Biernacki, C., G. Celeux, and G. Govaert (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence 22.

Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 2 Multivariate Statistical Modeling, pp. 69–113. Dordrecht: Springer Netherlands.

Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6.

Silva, A. et al. (2019). A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data. BMC Bioinformatics 20. Link

Silva, A. et al. (2018). Finite Mixtures of Matrix Variate Poisson-Log Normal Distributions for Three-Way Count Data. arXiv preprint arXiv:1807.08380.

Examples

## Not run: 
# Generating simulated matrix variate count data
set.seed(1234)
trueG <- 2 # number of total G
truer <- 2 # number of total occasions
truep <- 3 # number of total responses
trueN <- 100 # number of total units

# Mu is a r x p matrix
trueM1 <- matrix(rep(6, (truer * truep)),
                 ncol = truep,
                 nrow = truer, byrow = TRUE)

trueM2 <- matrix(rep(1, (truer * truep)),
                 ncol = truep,
                 nrow = truer,
                 byrow = TRUE)

trueMall <- rbind(trueM1, trueM2)

# Phi is a r x r matrix
# Loading needed packages for generating data
# if (!require(clusterGeneration)) install.packages("clusterGeneration")
# library("clusterGeneration")

# Covariance matrix containing variances and covariances between r occasions
# truePhi1 <- clusterGeneration::genPositiveDefMat("unifcorrmat",
#                                                   dim = truer,
#                                                   rangeVar = c(1, 1.7))$Sigma
truePhi1 <- matrix(c(1.075551, -0.488301, -0.488301, 1.362777), nrow = 2)
truePhi1[1, 1] <- 1 # For identifiability issues

# truePhi2 <- clusterGeneration::genPositiveDefMat("unifcorrmat",
#                                                   dim = truer,
#                                                   rangeVar = c(0.7, 0.7))$Sigma
truePhi2 <- matrix(c(0.7000000, 0.6585887, 0.6585887, 0.7000000), nrow = 2)
truePhi2[1, 1] <- 1 # For identifiability issues
truePhiall <- rbind(truePhi1, truePhi2)

# Omega is a p x p matrix
# Covariance matrix containing variances and covariances between p responses
# trueOmega1 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep,
#                                    rangeVar = c(1, 1.7))$Sigma
trueOmega1 <- matrix(c(1.0526554, 1.0841910, -0.7976842,
                       1.0841910,  1.1518811, -0.8068102,
                       -0.7976842, -0.8068102,  1.4090578),
                       nrow = 3)
# trueOmega2 <- clusterGeneration::genPositiveDefMat("unifcorrmat", dim = truep,
#                                    rangeVar = c(0.7, 0.7))$Sigma
trueOmega2 <- matrix(c(0.7000000, 0.5513744, 0.4441598,
                       0.5513744, 0.7000000, 0.4726577,
                       0.4441598, 0.4726577, 0.7000000),
                       nrow = 3)
trueOmegaAll <- rbind(trueOmega1, trueOmega2)

# Generated simulated data
sampleData <- mixMVPLN::mvplnDataGenerator(nOccasions = truer,
                                           nResponses = truep,
                                           nUnits = trueN,
                                           mixingProportions = c(0.79, 0.21),
                                           matrixMean = trueMall,
                                           phi = truePhiall,
                                           omega = trueOmegaAll)

# Clustering simulated matrix variate count data
clusteringResults <- mixMVPLN::mvplnMCMCclus(dataset = sampleData$dataset,
                                      membership = sampleData$truemembership,
                                      gmin = 1,
                                      gmax = 2,
                                      nChains = 3,
                                      nIterations = 300,
                                      initMethod = "kmeans",
                                      nInitIterations = 1,
                                      normalize = "Yes")

# Visualize
mvplnClustVisuals <- mixMVPLN::mvplnVisualize(
  dataset = simulatedMVData$dataset,
  plots = 'bar',
  probabilities = clusteringResults$allResults[[2]]$allresults$probaPost,
  clusterMembershipVector = clusteringResults$allResults[[2]]$allresults$clusterlabels,
  fileName = paste0('Plot_', date()),
  printPlot = TRUE,
  format = 'png')

## End(Not run)


anjalisilva/mixMVPLN documentation built on Jan. 15, 2024, 1:10 a.m.