bigdpclust
bigdpclust
performs clustering of tall data using a Bayesian
nonparametric Gaussian Dirichlet process mixture model.
You can install the development version of bigdpclust
from
GitHub with:
#install.packages("devtools")
devtools::install_github("borishejblum/bigdpclust")
bigdpclust
depends on the weightedobs branch from the NPflow
package, which can be installed through the following command:
devtools::install_github(repo = "borishejblum/NPflow", ref = "weightedobs")
library(ggplot2)
library(bigdpclust)
n1 <- 100000
n2 <- 100
mydata <- rbind(cbind(rnorm(n1), rnorm(n = n1)),
cbind(rnorm(n2, m=10), rnorm(n = n2, m=10)))
plot(mydata)
res <- bigdpclust(mydata, nclumps=100,
Nmcmc = 1000, plotevery = 2000, burnin = 500)
table(res$cluster[1:n1])
#>
#> 2
#> 100000
table(res$cluster[n1 + 1:n2])
#>
#> 1
#> 100
– Boris Hejblum & Paul Kirk
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.