# clusterMix: Cluster Observations Based on Indicator MCMC Draws In bayesm: Bayesian Inference for Marketing/Micro-Econometrics

## Description

`clusterMix` uses MCMC draws of indicator variables from a normal component mixture model to cluster observations based on a similarity matrix.

## Usage

 `1` ```clusterMix(zdraw, cutoff=0.9, SILENT=FALSE, nprint=BayesmConstant.nprint) ```

## Arguments

 `zdraw` R x nobs array of draws of indicators `cutoff` cutoff probability for similarity (def: `0.9`) `SILENT` logical flag for silent operation (def: `FALSE`) `nprint` print every nprint'th draw (def: `100`)

## Details

Define a similarity matrix, Sim with `Sim[i,j]=1` if observations i and j are in same component. Compute the posterior mean of Sim over indicator draws.

Clustering is achieved by two means:

Method A: Find the indicator draw whose similarity matrix minimizes loss(E[Sim]-Sim(z)), where loss is absolute deviation.

Method B: Define a Similarity matrix by setting any element of E[Sim] = 1 if E[Sim] > cutoff. Compute the clustering scheme associated with this "windsorized" Similarity matrix.

## Value

A list containing:

 `clustera:` indicator function for clustering based on method A above `clusterb:` indicator function for clustering based on method B above

## Warning

This routine is a utility routine that does not check the input arguments for proper dimensions and type.

## Author(s)

Peter Rossi, Anderson School, UCLA, perossichi@gmail.com.

## References

For further discussion, see Bayesian Statistics and Marketing by Rossi, Allenby, and McCulloch Chapter 3.
http://www.perossi.org/home/bsm-1

`rnmixGibbs`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```if(nchar(Sys.getenv("LONG_TEST")) != 0) { ## simulate data from mixture of normals n = 500 pvec = c(.5,.5) mu1 = c(2,2) mu2 = c(-2,-2) Sigma1 = matrix(c(1,0.5,0.5,1), ncol=2) Sigma2 = matrix(c(1,0.5,0.5,1), ncol=2) comps = NULL comps[] = list(mu1, backsolve(chol(Sigma1),diag(2))) comps[] = list(mu2, backsolve(chol(Sigma2),diag(2))) dm = rmixture(n, pvec, comps) ## run MCMC on normal mixture Data = list(y=dm\$x) ncomp = 2 Prior = list(ncomp=ncomp, a=c(rep(100,ncomp))) R = 2000 Mcmc = list(R=R, keep=1) out = rnmixGibbs(Data=Data, Prior=Prior, Mcmc=Mcmc) ## find clusters begin = 500 end = R outclusterMix = clusterMix(out\$nmix\$zdraw[begin:end,]) ## check on clustering versus "truth" ## note: there could be switched labels table(outclusterMix\$clustera, dm\$z) table(outclusterMix\$clusterb, dm\$z) } ```