README.md

bigdpclust

CRAN_Status_Badge Travis-CI Build
Status AppVeyor Build
Status Downloads

bigdpclust performs clustering of tall data using a Bayesian nonparametric Gaussian Dirichlet process mixture model.

Installation

You can install the development version of bigdpclust from GitHub with:

#install.packages("devtools")
devtools::install_github("borishejblum/bigdpclust")

bigdpclust depends on the weightedobs branch from the NPflow package, which can be installed through the following command:

devtools::install_github(repo = "borishejblum/NPflow", ref = "weightedobs")
library(ggplot2)
library(bigdpclust)

n1 <- 100000
n2 <- 100
mydata <- rbind(cbind(rnorm(n1), rnorm(n = n1)),
                cbind(rnorm(n2, m=10), rnorm(n = n2, m=10)))
plot(mydata)


res <- bigdpclust(mydata, nclumps=100, 
                  Nmcmc = 1000, plotevery = 2000, burnin = 500)
table(res$cluster[1:n1])
#> 
#>      2 
#> 100000
table(res$cluster[n1 + 1:n2])
#> 
#>   1 
#> 100

– Boris Hejblum & Paul Kirk



borishejblum/bigdpclust documentation built on Dec. 18, 2019, 3:39 a.m.