msal: Model-Based Clustering using a Mixture of SAL Distributions

Description Usage Arguments Details Value Author(s) References Examples

View source: R/msal.R

Description

Performs model-based clustering using a mixture of SAL distributions. The expectation-maximization (EM) algorithm is used for parameter estimation, the Aitken's acceleration criterion is used to determine convergence, both the BIC and ICL values are given for the considered mixtures.

Usage

1
2
msal(x, G, start = 1, max.it = 10000, eps = 0.01, print.it = F, print.warn = F, 
print.prmtrs = F)

Arguments

x

A n by p matrix where each row corresponds a p-dimensional observation.

G

The desired number of mixture components.

start

Specifies how to intialize the zig matrix. If start equals 1, k-means clustering is used. If start equals 2, a random start is used. If start is a vector of length n, then the zig matrix is constructed based from this vector.

max.it

The desired number of iterations for the EM algorithm.

eps

The desired difference between the asymptotic estimate of the log-likelihood and the current log-likelihood value.

print.it

If True, the iteration number of the EM algorithm is printed.

print.warn

If True, the observation number that the mean vector is closet too is given.

print.prmtrs

If True, the parameter set is printed on each iteration of the EM algorithm.

Details

The mixture of SAL distributions are fitted using an EM algorithm with a “Set-Back” procedure to deal with the issue of Infinite Log-Likelihood Values that arise when updating the mean vector (see Section 3.4.2 of Franczak et.al (2014) for details).

Value

The msal function outputs a list with the following components:

loglik

A vector giving the log-likelihood values from each iteration of the considered EM algorithm.

alpha

A matrix where each row specifies the direction of skewness in each variable for each mixture component.

sig

An array where each matrix specifies the covariance matrix for each mixture component.

mu

A matrix where each row gives the mean vector for each mixture component.

pi.g

A vector specifying the mixing components.

bic

An integer giving the Bayesian Information Criterion (BIC) for the fitted model.

icl

An integer giving the Integrated Completed Likelihood (ICL) for the fitted model.

cluster

A vector of length n giving the group label for each observation in the considered data set.

Author(s)

Brian C. Franczak [aut, cre], Ryan P. Browne [aut, ctb], Paul D. McNicholas [aut, ctb]

Maintainer: Brian C. Franczak <franczakb@macewan.ca>

References

Franczak et. al (2014). Mixtures of Shifted Asymmetric Laplace Distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(6), 1149-1157.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Clustering Simulated Data
alpha <- matrix(c(2,2,1,2),2,2)
sig <- array(NA,dim=c(2,2,2))
sig[,,1] <- diag(2)
sig[,,2] <- matrix(c(1,0.5,0.5,1),2,2)
mu <- matrix(c(0,0,-2,5),2,2)
pi.g <- rep(1/2,2)
x <- rmsal(n=500,p=2,alpha=alpha,sig=sig,mu=mu,pi.g=pi.g)

msal.ex1 <- msal(x=x[,-1],G=2)
table(x[,1],msal.ex1$cluster)

## Clustering the Old Faithful Geyser Data
data(faithful)
msal.ex2 <- msal(x=faithful,G=2)
plot(x=faithful,col=msal.ex2$cluster)

## Clustering the Yeast Data
data(yeast)
msal.ex3 <- msal(x=yeast[,-1],G=2)
table(yeast[,1],msal.ex3$cluster)

MixSAL documentation built on May 2, 2019, 7:04 a.m.