comix | R Documentation |
This function generates a sample from the posterior of COMIX.
comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)
Y |
Matrix of the data. Each row represents an observation. |
C |
Vector of the group label of each observation. Labels must be integers starting from 1. |
prior |
A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters:
|
pmc |
A list giving the Population Monte Carlo (PMC) parameters:
|
state |
A list giving the initial cluster labels:
|
ncores |
The number of CPU cores to utilize in parallel. Defaults to 2. |
An object of class COMIX, a list of 4:
chain
, a named list:
t
: an nsave
x nrow(Y)
matrix with estimated cluster labels
for each saved step of the chain and each observation in the data Y
.
z
: a nsave
x nrow(Y)
matrix with estimated values of
the latent z_ij variable for the parameterization of the
multivariate skew normal distribution used in the sampler for each saved step of
the chain and each observation in the data Y
.
W
: an length(unique(C))
x K
x
nsave
: array storing the estimated sample- and cluster-specific weights for each
saved step of the chain.
xi
: an length(unique(C))
x (ncol(Y) x K)
x nsave
array storing the estimated sample- and cluster-specific
multivariate skew normal location parameters of the kernel for each saved step of the chain.
xi0
: an ncol(Y)
x K
x
nsave
: array storing the estimated cluster-specific
group location parameters for each saved step of the chain.
psi
: an ncol(Y)
x K
x nsave
array storing the estimated cluster-specific skew parameters of the kernels in
the parameterization of the
multivariate skew normal distribution used in the sampler
for each saved step of the chain.
G
: an ncol(Y)
x (ncol(Y) x K)
x nsave
array storing the estimated cluster-specific
multivariate skew normal scale matrix (in row format) of the kernel
used in the sampler for each saved step of the chain.
E
: an ncol(Y)
x (ncol(Y) x K)
x nsave
array storing the estimated covariance matrix
(in row format) of the multivariate normal distribution from which the
sample- and cluster-specific location parameters are drawn for each saved step
of the chain.
eta
: a nsave
x 1
matrix storing the
estimated Dirichlet Process Mixture concentration parameter for each saved step of the chain.
Sigma
: an ncol(Y)
x (ncol(Y) x K)
x nsave
array storing the estimated cluster-specific
multivariate skew normal scale matrix (in row format) of the kernel for each saved step of the chain.
alpha
: an ncol(Y)
x K
x nsave
array storing the estimated cluster-specific skew parameters of the
kernel's multivariate skew normal distribution
for each saved step of the chain.
data
, a named list that includes the matrix of the data Y
and C
the vector of the group label of each observation.
prior
and pmc
, the lists, as above, that were provided as inputs to
the function.
library(COMIX) # Number of observations for each sample (row) and cluster (column): njk <- matrix( c( 150, 300, 250, 200 ), nrow = 2, byrow = TRUE ) # Dimension of data: p <- 3 # Scale and skew parameters for first cluster: Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p) alpha1 <- rep(0, p) alpha1[1] <- -5 # location parameter for first cluster in first sample: xi11 <- rep(0, p) # location parameter for first cluster in second sample (aligned with first): xi21 <- rep(0, p) # Scale and skew parameters for second cluster: Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p) alpha2 <- rep(0, p) alpha2[2] <- 5 # location parameter for second cluster in first sample: xi12 <- rep(3, p) # location parameter for second cluster in second sample (misaligned with first): xi22 <- rep(4, p) # Sample data: set.seed(1) Y <- rbind( sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1), sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2), sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1), sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2) ) C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2])) prior <- list(zeta = 1, K = 10) pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation # Fit the model: res <- comix(Y, C, pmc = pmc, prior = prior) # Relabel to resolve potential label switching issues: res_relab <- relabelChain(res) # Generate calibrated data: cal <- calibrateNoDist(res_relab) # Compare raw and calibrated data: (see plot in vignette) # par(mfrow=c(1, 2)) # plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) ) # Get posterior estimates for the model parameters: res_summary <- summarizeChain(res_relab) # Check for instance, the cluster assignment labels: table(res_summary$t) # Indeed the same as colSums(njk) # Or examine the skewness parameter for the non-trivial clusters: res_summary$alpha[ , unique(res_summary$t)] # And compare those to cbind(alpha1, alpha2) # (see vignette for a more detailed example)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.