bosclassif: Function to perform a classification

Description Usage Arguments Value Author(s) Examples

View source: R/bosclassif.R

Description

This function performs a classification algorithm on a dataset with ordinal features, and a label variable that belongs to (1,2,...,kr). The classification function provides two classification models. The first model, (chosen by the argument kc=0), is a multivariate BOS model with the assumtion that, conditional on the class of the observations, the features are independent. The second model is a parsimonious version of the first model. Parsimony is introduced by grouping the features into clusters (as in co-clustering) and assuming that the features of a cluster have a common distribution.

Usage

1
2
bosclassif(x, y, idx_list=c(1), kr, kc=0, init, nbSEM, nbSEMburn, 
          nbindmini, m=0, percentRandomB=0) 

Arguments

x

Matrix made of ordinal data of dimension N*Jtot. The features with same numbers of levels must be placed side by side. The missing values should be coded as NA.

y

Vector of length N. It should represent the classes corresponding to each row of x. Must be labeled with numbers (1,2,...,kr).

idx_list

Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begin in matrix x.

kr

Number of row classes.

kc

Vector of length D. The d^th element indicates the number of column clusters. Set to 0 to choose a classical multivariate BOS model.

m

Vector of length D. The d^th element defines the number of levels of the ordinal data.

nbSEM

Number of SEM-Gibbs iterations realized to estimate parameters.

nbSEMburn

Number of SEM-Gibbs burn-in iterations for estimating parameters. This parameter must be inferior to nbSEM.

nbindmini

Minimum number of cells belonging to a block.

init

String that indicates the kind of initialisation. Must be one of the following strings: "kmeans", "random" or "randomBurnin".

percentRandomB

Vector of length 1. Indicates the percentage of resampling when init is equal to "randomBurnin".

Value

Return an object. The slots are:

@zr

Vector of length N with resulting row partitions.

@zc

List of length D. The d^th item is a vector of length J[d] representing the column partitions for the group of variables d.

@J

Vector of length D. The d^th item represents the number of columns for d^th group of variables.

@W

List of length D. Item d is a matrix of dimension J*kc[d] such that W[j,h]=1 if j belongs to cluster h.

@V

Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g.

@icl

ICL value for co-clustering.

@kr

Number of row classes.

@name

Name of the result.

@number_distrib

Number of groups of variables.

@pi

Vector of length kr. Row mixing proportions.

@rho

List of length D. The d^th item represents the column mixing proportion for the d^th group of variables.

@dlist

List of length d. The d^th item represents the indexes of group of variables d.

@kc

Vector of length D. The d^th element represents the number of clusters column H for the d^th group of variables.

@m

Vector of length D. The d^th element represents the number of levels of the d^th group of variables.

@nbSEM

Number of SEM-Gibbs algorithm iteration.

@params

List of length D. The d^th item represents the blocks parameters for a group of variables d.

@xhat

List of length D. The d^th item represents the dataset of the d^th group of variables, with missing values completed.

Author(s)

Margot Selosse, Julien Jacques, Christophe Biernacki.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# loading the real dataset
data("dataqol.classif")

set.seed(5)

# loading the ordinal data
M <- as.matrix(dataqol.classif[,2:29])


# creating the classes values
y <- as.vector(dataqol.classif$death)


# sampling datasets for training and to predict
nb.sample <- ceiling(nrow(M)*2/3)
sample.train <- sample(1:nrow(M), nb.sample, replace=FALSE)

M.train <- M[sample.train,]
M.validation <- M[-sample.train,]
nb.missing.validation <- length(which(M.validation==0))
m <- c(4)
M.validation[which(M.validation==0)] <- sample(1:m, nb.missing.validation,replace=TRUE)


y.train <- y[sample.train]
y.validation <- y[-sample.train]



# configuration for SEM algorithm
nbSEM=50
nbSEMburn=40
nbindmini=1
init="kmeans"

# number of classes to predict
kr <- 2
# different kc to test with cross-validation
kcol <- 1


res <- bosclassif(x=M.train,y=y.train,kr=kr,kc=kcol,m=m,
                  nbSEM=nbSEM,nbSEMburn=nbSEMburn,
                  nbindmini=nbindmini,init=init)

predictions <- predict(res, M.validation)

ordinalClust documentation built on Jan. 13, 2021, 8:43 a.m.