# Bayesian agglomerative clustering for high dimensional data with variable selection.

### Description

The function clusters data saved in a matrix using an additive linear model with disappearing random effects.
The model has built-in spike-and-slab components which quantifies important variables for clustering and can be extracted using the `imp`

function.

### Usage

1 2 |

### Arguments

`x` |
A numeric matrix, with clustering individuals in rows and variables in columns. |

`rep.id` |
A vector consisting of positive integer elements having the
same length as the number of rows of |

`effect.family` |
Distribution family of the disappearing random components. The choices are "gaussian" or "alaplace" allowing Gaussian or asymmetric Laplace family, respectively. |

`var.select` |
A logical value, |

`transformed.par` |
The transformed model parameters in a vector. The length of the vector depends on the chosen model and the availability of variable selection. The log transformation is supposed to be applied for the variance parameters, the identity for the mean, and the logit for the proportions. The function loglikelihood can be used to estimate them from the data. |

`labels` |
A vector of strings
referring to the labels of clustering types. The length of the
vector should match to |

### Details

The function calls internal `C`

functions depending on the chosen
model. The C-stack of the system may overflow if you have a large
dataset. You may need to adjust the stack before running `R`

using your
operation system command line. If you use Linux, open a console and type
`>`

ulimit -s unlimited and then run `R`

in the same console. The Microsoft Windows users don't need to increase the stack size.

We assumed a Bayesian linear model for clustering being

*y_{vctr}=m+h_{vct}+d_{v}*g_{vc}*t_{vc}+e_{vctr}*

where *y_{vctr}* is the available data on variable *v*,
cluster *c*, clustering type *t*, and replicate *r*; *h_{vct}*
is the between-type error, *t_{vc}* is the disappearing random component controlled by the Bernoulli variables *d_{v}* with success probability *q* and *g_{vc}* with
success probability *p*; and *e_{vctr}* is the between-replicate error. The types inside a cluster share the same *t_{vc}*, but may arise with a different *h_{vct}*.
For more details see Vahid Partovi Nia and Anthony C. Davison (2012)

### Value

`data` |
The data matrix, reordered according to |

`repno` |
The number of replicates of the values of |

`merge` |
The merge matrix, in |

`height` |
A monotone vector referring to the height of the constructed tree. |

`logposterior` |
The log posterior for each merge. |

`clust.number` |
The number of clusters for each merge. |

`cut` |
The value of the height corresponding to the maximum of the log posterior in agglomerative path. |

`transformed.par` |
The transformed values of the model parameters. The log transformation is applied for the variance parameters, the identity for the mean, and the logit for the proportions. |

`labels` |
The labels associated to each clustering type. |

`effect.family` |
The distribution assigned to the disappearing random effect in the function arguments. |

`var.select` |
The variable selection chosen in the function arguments. |

### References

Vahid Partovi Nia and Anthony C. Davison (2012). High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust. Journal of Statistical Software, 47(5), 1-22. URL http://www.jstatsoft.org/v47/i05/

### See Also

loglikelihood, meancss, imp.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ```
data(gaelle)
# unreplicated clustering
gaelle.bclust<-bclust(x=gaelle,
transformed.par=c(-1.84,-0.99,1.63,0.08,-0.16,-1.68))
par(mfrow=c(2,1))
plot(as.dendrogram(gaelle.bclust))
abline(h=gaelle.bclust$cut)
plot(gaelle.bclust$clust.number,gaelle.bclust$logposterior,
xlab="Number of clusters",ylab="Log posterior",type="b")
abline(h=max(gaelle.bclust$logposterior))
#replicated clustering
gaelle.id<-rep(1:14,c(3,rep(4,13)))
# first 3 rows replication of ColWT
# 4 replications for the others
gaelle.lab<-c("ColWT","d172","d263","isa2",
"sex4","dpe2","mex1","sex3","pgm","sex1",
"WsWT","tpt","RLDWT","ke103")
gaelle.bclust<-bclust(gaelle,rep.id=gaelle.id,labels=gaelle.lab,
transformed.par=c(-1.84,-0.99,1.63,0.08,-0.16,-1.68))
plot(as.dendrogram(gaelle.bclust))
abline(h=gaelle.bclust$cut)
plot(gaelle.bclust$clust.number,gaelle.bclust$logposterior,
xlab="Number of clusters",ylab="Log posterior",type="b")
abline(h=max(gaelle.bclust$logposterior))
``` |