Description Usage Arguments Details Value Author(s) References Examples
This function performs clustering on online datasets. The number of cells is data-driven and need not to be chosen in advance by the user.
1 2 3 |
mydata |
a matrix where each row corresponds to an observation of length d. |
R |
a positive real value that should be larger than the maximum Euclidean distance of all the observations in |
coeff |
a positive real value, enforcing large number of cells. The default, 2, should be convenient for most users. A larger value brings more cells for the clustering. |
K_max |
a positive integer indicating the maximum number of cells allowed for the clustering. |
scaling |
logical indicating whether the matrix |
var_ind |
logical indicating whether predicted centers of cells will be calculated sequentially. If |
N_iterations |
a positive integer indicating the number of iterations of algorithm. |
plot_ind |
logical indicating whether clusters should be plotted. |
axis_ind |
numeric indicating which axes are to be plotted if d >= 2. The default is the first two coordinates of observations. |
The PACBO algorithm is introduced and fully described in Le Li, Benjamin Guedj, Sebastien Loustau (2016), "PAC-Bayesian Online Clustering" (https://arxiv.org/abs/1602.00522). It relies on PAC-Bayesian approach, allowing for a dynamic (i.e., time-dependent) estimation of the number of clusters, up to K_max
clusters. Its implementation is done via an RJMCMC-flavored algorithm.
Returns a list including
predicted_centers |
a matrix of predicted centers of cells, where each row corresponds to a center. |
nb_of_clusters |
positive integer indicating the estimation of the number of cells for the dataset. |
labels |
labels for observations in |
Le Li <le@iadvize.com>
Le Li, Benjamin Guedj and Sebastien Loustau (2016), PAC-Bayesian Online Clustering, arXiv preprint: https://arxiv.org/abs/1602.00522.
1 2 3 4 5 6 7 8 9 10 11 12 | ## generating 4 clusters of 100 points in \strong{R}^{5}.
set.seed(100)
Nb <- 4
d <- 5
T <- 100
proportion = rep(1/Nb, Nb)
Mean_vectors <- matrix(runif(d*Nb,min=-10, max=10),nrow=Nb,ncol=d, byrow=TRUE)
mydata <- matrix(replicate(T, rmnorm(1, mean= Mean_vectors[sample(1:Nb, 1, prob = proportion),],
varcov = diag(1,d))), nrow = T, byrow=T)
R <- max(sqrt(rowSums(mydata^2)))
##run the algorithm.
result <- PACBO(mydata, R, plot_ind = TRUE)
|
Loading required package: mnormt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.