bosclust: Function to perform a clustering

Description Usage Arguments Value Author(s) Examples

View source: R/bosclust.R

Description

This function performs a clustering algorithm on ordinal data by using the multiple latent block model (see references for further details). It allows the user to define D groups of variables that have different numbers of levels. The BOS distribution is used, and the parameters inference is obtained using the SEM-Gibbs algorithm.

Usage

1
2
bosclust(x, idx_list=c(1), kr, init, nbSEM, nbSEMburn, 
        nbindmini, m=0, percentRandomB=0)

Arguments

x

Matrix made of ordinal data of dimension N*Jtot. The features with the same numbers of levels must be placed side by side. The missing values should be coded as NA.

idx_list

Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begin in matrix x.

kr

Number of row clusters.

m

Vector of length D. The d^th element defines the number of levels of the ordinal data.

nbSEM

Number of SEM-Gibbs iterations realized to estimate the parameters.

nbSEMburn

Number of SEM-Gibbs burn-in iterations for estimating parameters. This parameter must be inferior to nbSEM.

nbindmini

Minimum number of cells belonging to a block.

init

String that indicates the kind of initialisation. Must be one of the following words : "kmeans", "random" or "randomBurnin".

percentRandomB

Vector of length 1. Indicates the percentage of resampling when init is equal to "randomBurnin".

Value

@V

Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g.

@zr

Vector of length N with resulting row partitions.

@pi

Vector of length kr. This corresponds to the row mixing proportions.

@m

Vector of length D. The d^th element represents the number of levels of the d^th group of variables.

@icl

ICL value for clustering.

@name

Name of the result.

@params

List of length D. The d^th item stores the resulting position and precision parameters mu and pi.

@paramschain

List of length nbSEMburn. The parameters of the blocks are stored for each iteration of the SEM-Gibbs algorithm.

@xhat

List of length D. The d^th item represents the dataset of the d^th group of variables, with missing values completed.

@zrchain

Matrix of dimension nbSEM*N. Row i represents the row cluster partitions at iteration i.

@pichain

List of length nbSEM. Item i is a vector of length kr that contains the row mixing proportions at iteration i.

Author(s)

Margot Selosse, Julien Jacques, Christophe Biernacki.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
  library(ordinalClust)
  data("dataqol")
  set.seed(5)

  # loading the ordinal data
  M <- as.matrix(dataqol[,2:29])

  m = 4

  krow = 4

  nbSEM=50
  nbSEMburn=40
  nbindmini=2
  init = "random"


  object <- bosclust(x=M,kr=krow, m=m, nbSEM=nbSEM,
      nbSEMburn=nbSEMburn, nbindmini=nbindmini, init=init)
    
  

ordinalClust documentation built on Jan. 13, 2021, 8:43 a.m.