imbc: A function that combines model-based clustering as well as...

Description Usage Arguments Value Examples

Description

A function that combines model-based clustering as well as your input to cluster your data

Usage

1
2
imbc(data, n = 1, G = NULL, query = "minimax",
  distanceMethod = "euclidean", iterationMax = 500)

Arguments

data

the data you wish to use (must be continuous)

n

the number of points you wish to be queried on at once

G

number of clusters. The default allows Mclust to identify the number of clusters

query

how you wish to be queried. This must be one of "minimax", "maxWithinClust" or "minBetweenClust", with a default of "minimax".

distanceMethod

a method used to find distances between points. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski", with a default of "euclidean."

iterationMax

the maximum number of iterations you wish to see when converging mstep and estep

Value

An object providing the optimal (according to BIC) mixture model estimation.

The details of the output components are as follows:

call

The matched call

data

The input data matrix

modelName

A character string denoting the model at which the optimal BIC occurs

n

The number of observations in the data

d

The dimension of the data

G

The optimal number of mixture components

BIC

All BIC values

bic

Optimal BIC value

loglik

The log-likelihood corresponding to the optimal BIC

df

The number of estimated parameters

hypvol

The hypervolume parameter for the noise component if required, otherwise set to NULL (see hypvol).

parameters

A list with the following components:

pro

A vector whose kth component is the mixing proportion for the kth component of the mixture model. If missing, equal proportions are assumed

mean

The mean for each component. If there is more than one component, this is a matrix whose kth column is the mean of the kth component of the mixture model.

variance

A list of variance parameters for the model. The components of this list depend on the model specification. See the help file for mclustVariance for details.

z

A matrix whose [i,k]th entry is the probability that observation i in the test data belongs to the kth class

classification

The classification corresponding to z, i.e. map(z).

uncertainty

The uncertainty associated with the classification.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#Load data
library(mclust)
data(banknote)

#Create new dataset with only continuous variables
bankdata <- banknote[,2:7]

#Run imbc while querying user on onyl 1 data point at a time
#Use default querying algorithm (minimax)
output <- imbc(bankdata)

#query two points at once and using minimum between cluster distance as query method, and specifying 2 clusters
output2 <- imbc(bankdata, n = 2, G = 2, query = "minBetweenClust")

#gives vector of classification of each row
output2$classification

#classification probability matrix
#output2$z

lsheremet/imbc documentation built on May 20, 2019, 7:01 p.m.