output: github_document
Copyright 2019 Faustine Bousquet (faustine.bousquet@tabmo.io or faustine.bousquet@umontpellier.fr) from TabMo and IMAG (Institut Montpelliérain Alexander Grothendieck, University of Montpellier). The binomialMix package is available under the Apache2 license.
The binomialMix package provides a clustering method for longitudinal and non gaussian data. It uses an EM algorithm for GLM.
You can install the binomialMix
R package with the following R command:
# install.packages("devtools")
devtools::install_git("https://gitlab.com/tabmo/binomialmix")
devtools::install_gitlab("tabmo/binomialMix")
You can also directly use the git repository :
git clone https://gitlab.com/tabmo/binomialMix
Once you cloned the git repository, you can run to install the binomialMix
package:
devtools::install("/path/to/binomialMix/pkg") # edit the path
library(binomialMix)
data(adcampaign)
Of course, you can use your own data. The format you need to have is the following : - a dataframe is needed - a column with factor id representing the objects you want to cluster - a target value * a weighted value variable as we are in case of binomial data - at least, one column as explicative variable
Run the clustering algorithm Here, we want to cluster advertising campaigns. Each campaigns (column "id") is composed of n_c observations from the whole dataset. We have repeated mesure for a same id level. The explicatives variables could be : day, timeSlot or app_or_site. We want to try with K=3 clusters.
model_formula<-"ctr~timeSlot+day"
weighted_variable<-"impressions"
nb_cluster<-3
df_tocluster<-adcampaign
col_id<-"id"
result_K3<-runEM(model_formula,
weighted_variable,
nb_cluster,
df_tocluster,
col_id)
Plotting evolution of Loglikelihood over iteration
# Plotting Loglikelihood :
install.packages("ggplot2")
library(ggplot2)
qplot(seq_along(result_K3[[1]]), result_K3[[1]])
Matrix of beta estimated (values taken for last iteration) :
head(result_K3[[2]][[length(result_K3[[2]])]])
## [,1] [,2] [,3]
## [1,] -3.8126661 -5.2914380 -3.2418550
## [2,] -0.4134079 0.3794783 0.4115441
## [3,] -0.2975236 0.2407683 0.4076950
## [4,] -0.1948168 0.2122175 0.3753815
## [5,] -0.1590104 0.4028323 0.1885215
## [6,] -0.2160946 0.3545593 0.1872363
Vector of proportion in each cluster (values taken for last iteration) :
result_K3[[3]][[length(result_K3[[3]])]]
## [1] 0.1871000 0.7246125 0.0883000
Matrix of proability for each campaign to belong to the different cluster (values taken for last iteration) :
## Too large to print here
result_K3[[4]][[length(result_K3[[4]])]]
BIC value as numeric :
paste0("BIC=",result_K3[[5]][[length(result_K3[[5]])]])
## [1] "BIC=387914.537681485"
ICL value as numeric :
paste0("ICL value=",result_K3[[6]][[length(result_K3[[6]])]])
## [1] "ICL value=387919.96962191"
Total number of EM iteration as numeric value :
paste0("Number of EM iteration :",length(result_K3[[7]]))
## [1] "Number of EM iteration :10"
Matrix of Fisher scoring number of iteration at each M step :
matrix(unlist(result_K3[[7]]),ncol=length(result_K3[[7]])-1)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 4 3 4 6 3 3 2 1 1
## [2,] 3 2 2 2 2 2 2 1 1
## [3,] 5 4 2 2 3 1 1 1 1
#nrow is equal to the number of cluster
#ncol is equal to the number of iteration
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.