BICFunction: Model Selection Via Bayesian Information Criterion

View source: R/mplnMCMCEMClustering.R

BICFunctionR Documentation

Model Selection Via Bayesian Information Criterion

Description

Performs model selection using Bayesian Information Criterion (BIC) by Schwarz (1978). Formula: - 2 * logLikelihood + (nParameters * log(nObservations)).

Usage

BICFunction(
  logLikelihood,
  nParameters,
  nObservations,
  clusterRunOutput = NA,
  gmin,
  gmax,
  parallel = FALSE
)

Arguments

logLikelihood

A vector with value of final log-likelihoods for each cluster size.

nParameters

A vector with number of parameters for each cluster size.

nObservations

A positive integer specifying the number of observations in the dataset analyzed.

clusterRunOutput

Output from mplnVariational, mplnMCMCParallel, or mplnMCMCNonParallel, if available. Default value is NA. If provided, the vector of cluster labels obtained by mclust::map() for best model will be provided in the output.

gmin

A positive integer specifying the minimum number of components to be considered in the clustering run.

gmax

A positive integer, >gmin, specifying the maximum number of components to be considered in the clustering run.

parallel

TRUE or FALSE indicating if MPLNClust::mplnMCMCParallel has been used.

Value

Returns an S3 object of class MPLN with results.

  • allBICvalues - A vector of BIC values for each cluster size.

  • BICmodelselected - An integer specifying model selected by BIC

  • BICmodelSelectedLabels - A vector of integers specifying cluster labels for the model selected. Only provided if user input clusterRunOutput.

  • BICMessage - A character vector indicating if spurious clusters are detected. Otherwise, NA.

Author(s)

Anjali Silva, anjali@alumni.uoguelph.ca

References

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6.

Examples

trueMu1 <- c(6.5, 6, 6, 6, 6, 6)
trueMu2 <- c(2, 2.5, 2, 2, 2, 2)

trueSigma1 <- diag(6) * 2
trueSigma2 <- diag(6)

# Generating simulated data
sampleData <- MPLNClust::mplnDataGenerator(nObservations = 100,
                                 dimensionality = 6,
                                 mixingProportions = c(0.79, 0.21),
                                 mu = rbind(trueMu1, trueMu2),
                                 sigma = rbind(trueSigma1, trueSigma2),
                                 produceImage = "No")

# Clustering
mplnResults <- MPLNClust::mplnVariational(dataset = sampleData$dataset,
                                membership = sampleData$trueMembership,
                                gmin = 1,
                                gmax = 2,
                                initMethod = "kmeans",
                                nInitIterations = 2,
                                normalize = "Yes")

# Model selection
BICmodel <- MPLNClust::BICFunction(logLikelihood = mplnResults$logLikelihood,
                         nParameters = mplnResults$numbParameters,
                         nObservations = nrow(mplnResults$dataset),
                         clusterRunOutput = mplnResults$allResults,
                         gmin = mplnResults$gmin,
                         gmax = mplnResults$gmax,
                         parallel = FALSE)


anjalisilva/MPLNClust documentation built on Jan. 28, 2024, 11:02 a.m.