bigdpclust: Gaussian Dirichlet Process Mixture CLustering For Tall Data

Description Usage Arguments Author(s) Examples

View source: R/bigdpclust.R

Description

Gaussian Dirichlet Process Mixture CLustering For Tall Data

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
bigdpclust(
  data,
  coresets = NULL,
  clumping_fn = stats::kmeans,
  nclumps = min(500, nrow(data)/10),
  hyperG0 = NULL,
  Ninit = 50,
  Nmcmc = 1000,
  burnin = Nmcmc/5,
  thin = 2,
  loss_fn = "MBinderN",
  diagVar = FALSE,
  plotevery_nit = Nmcmc/10,
  doPlot = FALSE,
  verbose = FALSE
)

Arguments

data

n by p matrix wuith n observations in rows and p dimensions in columns.

coresets

a list with 3 components

burnin

an integer giving the number of MCMC iterations to burn. Ddefaults is half)

diagVar

logical flag indicating whether the covariance matrix of each cluster is constrained as diagonal, or unconstrained full matrix. Default is FALSE (unconstrained covariance).

plotevery_nit

an integer indicating the interval between plotted iterations when doPlot is TRUE. Default is Nmcmc/10

doPlot

logical flag indicating whether to plot MCMC iteration or not. Default is FALSE.

verbose

logical flag indicating whether partition info is messaged over at each MCMC iteration. Default is FALSE.

Author(s)

Boris Hejblum, Paul Kirk

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
n1 <- 50000
n2 <- 500
mydata <- rbind(cbind(rnorm(n1), rnorm(n = n1)),
                cbind(rnorm(n2, m=10), rnorm(n = n2, m=10)))
#plot(mydata)
#coresets <- stats::kmeans(mydata, centers = 100)[c("cluster", "centers", "size")]

res <- bigdpclust(mydata, nclumps=200)
table(res$cluster[1:n1])
table(res$cluster[n1 + 1:n2])

borishejblum/bigdpclust documentation built on Dec. 18, 2019, 3:39 a.m.