dmm.cluster.RModel: Use a Dirichlet Mixture Model on data to get cluster labels...

Description Usage Arguments Details Value

View source: R/dmm_cluster.R

Description

Use a Dirichlet Mixture Model on data to get cluster labels and cluster parameter values.

Usage

1
2
3
## S3 method for class 'RModel'
dmm.cluster(model, Xdata, alpha = 1, m_prior = 3,
  m_post = 3, iters = 5000, burnin = 200, shuffled = TRUE)

Arguments

model

An object returned by dmm.model().

Xdata

A 1D array of length N (univariate case) or 2D array of size N-by-d (mulitvariate case), where d is the dimensionailty of the data and N is the number of observations. Use a Dirichlet Mixture Model on data to get cluster labels and cluster parameter values.

alpha

A float. The concentration parameter. Default is 1.0.

m_prior

An integer. Optionally paramter only used in non-conjugate case. Default is 3.

m_post

An integer. Optionally paramter only used in non-conjugate case. Default is 3.

iters

An integer. Number of iterations. Default is 5000.

burnin

An integer. Amount of burn-in. Default is 200.

shuffled

A logical. Whether or not to shuffled the data. Default is true.

model

An object returned by dmm.model().

Xdata

A 1D array of length N (univariate case) or 2D array of size N-by-d (mulitvariate case), where d is the dimensionailty of the data and N is the number of observations.

Details

Performs iters iterations of Algorithm 2 (in conjugate case) or Algorithm 8 (in non-conjugate case) from Neal(2000) to generate possible clusters for the data in Xdata, using the model in model, with concentration parameter alpha. In the 1D case, Xdata is assumed to be a 1D array of floats. In the 2D case, Xdata is assumed to be a dxN array of floats, where the data is d-dimensional and N is the number of datapoints. Returns a list of states. The elements of the list are all states post-burnin iteration, with the default being a burnin of 200. By default, this array is shuffled so that it may be used to approximate I.I.D draws from the posterior.

A single state from the returned list of states has fields data and clusters. data is a dataframe consisting of the Xdata and their cluster labels. clusters is a data.table (is the user has the data.table package loaded) or a list.

If clusters is a data.table, each row refers to a cluster. Columns are the cluster label, the population, and the rest of the columns are parameters.

If clusters is a list, each element of the list refers to a clsuter, clusters[[i]] is a list containing of the above information for cluster i as elements. Each single item in clusters is a list with fields cluster, population, and params. E.g. clusters[[1]]$population is the population of cluster 1. The params field (clusters[[i]]$params) is itself a list of each of the parameters

To see a formatted summary of all the clusters in a given state use the dmm.summarize(clusters) function.

To see a plot of the labled data in a given state use the dmm.plot(data) function.

Value

A list of states (i.e. state = states[[i]]). A state is itself a list. A state has two fields: data and clusters.

data is a data.frame of the Xdata data points and their cluster labels. clusters is either a list or a data.table (if the data.table package is loaded by the user). It conatins (1) cluster labels, (2) the number of data points (i.e. population) of each cluster, and (3) all of the parameters for each cluster.


nsdumont/jDirichletMixtureModels documentation built on May 23, 2019, 2:51 p.m.