Description Usage Arguments Details Value References See Also Examples
As a main function, EmSkew fits the data into the specified multivariate mixture models via the EM Algorithm. Distributions (univariate and multivariate) available include Normal distribution, t-distribution, Skew Normal distribution, and Skew t-distribution.
1 2 3 |
dat |
The dataset, an n by p numeric matrix, where n is number of observations and p the dimension of data. |
g |
The number of components of the mixture model |
distr |
A three letter string indicating the type of distribution to be fitted, the default value is "mvn", the Normal distribution. See Details. |
ncov |
A small integer indicating the type of covariance structure; the default value is 3. See Details. |
clust |
A vector of integers specifying the initial partitions of the data; the default is NULL. |
init |
A list containing the initial parameters for the mixture model. See details. The default value is NULL. |
itmax |
A big integer specifying the maximum number of iterations to apply; the default value is 1000. |
epsilon |
A small number used to stop the EM algorithm loop when the relative difference between log-likelihood at each iteration become sufficient small; the default value is 1e-6. |
nkmeans |
An integer to specify the number of KMEANS partitions to be used to find the best initial values; the default value is 0. |
nrandom |
An integer to specify the number of random partitions to be used to find the best initial values; the default value is 10. |
nhclust |
A logical value to specify whether or not to use hierarchical cluster methods; the default is FALSE. If TRUE, the Complete Linkage method will be used. |
debug |
A logical value, if it is TRUE, the output will be printed out; FALSE silent; the default value is TRUE. |
initloop |
A integer specifying the number of initial loops when searching the best intial partitions. |
The distribution type, determined by the distr
parameter, which may take any one of the following values:
"mvn" for a multivariate normal, "mvt" for a multivariate t-distribution, "msn" for a multivariate skew normal distribution and "mst" for a multivariate skew t-distribution.
The covariance matrix type, represented by the ncov
parameter, may be any one of the following:
ncov
=1 for a common variance, ncov
=2 for a common diagonal variance, ncov
=3 for a general variance, ncov
=4 for a diagonal variance, ncov
=5 for
sigma(h)*I(p)(diagonal covariance with same identical diagonal element values).
The parameter init
requires following elements: pro
, a numeric vector of the mixing proportion of each component; mu
, a p by g matrix with each column as its corresponding mean;
sigma
, a three dimensional p by p by g array with its jth component matrix (p,p,j) as the covariance matrix for jth component of mixture models;
dof
, a vector of degrees of freedom for each component; delta
, a p by g matrix with its columns corresponding to skew parameter vectors.
Since we treat the list of pro
,mu
,sigma
,dof
,and delta
as a common
structure of parameters for our mixture models, we need to include all of them in the initial parameter list
init
by default although in some cases it does not make sense,
for example, dof
and delta
is not applicable to normal mixture model. But in most cases, the user only need give relevent paramters in the list.
When the parameter list init
is given, the program ignores both initial partition clust
and automatic partition methods such as nkmeans
;
only when both init
and clust
are not available, the program uses automatic approaches such as k-Means partition method to find the best inital values.
All three automatic approaches are used to find the best initial partition and initial values if required.
The return values include all potential parameters pro
,mu
,sigma
,dof
,and delta
,
but user should not use or interpret irrelevant information arbitrarily. For example, dof
and delta
for Normal mixture models.
error |
Error code, 0 = normal exit; 1 = did not converge within |
aic |
Akaike Information Criterion (AIC) |
bic |
Bayes Information Criterion (BIC) |
ICL |
Integrated Completed Likelihood Criterion (ICL) |
pro |
A vector of mixing proportions. |
mu |
A numeric matrix with each column corresponding to the mean. |
sigma |
An array of dimension (p,p,g) with first two dimension corresponding covariance matrix of each component. |
dof |
A vector of degrees of freedom for each component, see Details. |
delta |
A p by g matrix with each column corresponding to a skew parameter vector. |
clust |
A vector of final partition |
loglik |
The log likelihood at convergence |
lk |
A vector of log likelihood at each EM iteration |
tau |
An n by g matrix of posterior probability for each data point |
Biernacki C. Celeux G., and Govaert G. (2000). Assessing a Mixture Model for Clustering with the integrated Completed Likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(7). 719-725.
McLachlan G.J. and Krishnan T. (2008). The EM Algorithm and Extensions (2nd). New Jersay: Wiley.
McLachlan G.J. and Peel D. (2000). Finite Mixture Models. New York: Wiley.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | #define the dimension of dataset
n1=300;n2=300;n3=400;
nn<-c(n1,n2,n3)
p <- 2
ng <- 3
#define the parameters
sigma<-array(0,c(2,2,3))
for(h in 2:3) sigma[,,h]<-diag(2)
sigma[,,1]<-cbind( c(1,0.2),c(0.2,1))
mu <- cbind(c(4,-4),c(3.5,4),c( 0, 0))
#and other parameters if required for "mvt","msn","mst"
delta <- cbind(c(3,3),c(1,5),c(-3,1))
dof <- c(3,5,5)
pro <- c(0.3,0.3,0.4)
distr="mvn"
ncov=3
# generate a data set
set.seed(111) #random seed is reset
dat <- rdemmix(nn,p,ng,distr,mu,sigma)
# the following code can be used to get singular data (remarked off)
# dat[1:300,2]<--4
# dat[300+1:300,1]<-2
## dat[601:1000,1]<-0
## dat[601:1000,2]<-0
#fit the data using KMEANS to get the initial partitions (10 trials)
obj <- EmSkew(dat,ng,distr,ncov,itmax=1000,epsilon=1e-5,nkmeans=10)
# alternatively, if we define initial values like
initobj<-list()
initobj$pro <- pro
initobj$mu <- mu
initobj$sigma<- sigma
initobj$dof <- dof
initobj$delta<- delta
# then we can fit the data from initial values
obj <- EmSkew(dat,ng,distr,ncov,init=initobj,itmax=1000,epsilon=1e-5)
# finally, if we know inital partition such as
clust <- rep(1:ng,nn)
# then we can fit the data from given initial partition
obj <- EmSkew(dat,ng,distr,ncov,clust=clust,itmax=1000,epsilon=1e-5)
# plot the 2D contours
colnames(dat)<- paste("x",1:p,sep='')
# dev.new()
EmSkew.flow(dat,obj)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.