bosclassif: Function to perform a classification
In ordinalClust: Ordinal Data Clustering, Co-Clustering and Classification

Description Usage Arguments Value Author(s) Examples

View source: R/bosclassif.R

This function performs a classification algorithm on a dataset with ordinal features, and a label variable that belongs to (1,2,...,kr). The classification function provides two classification models. The first model, (chosen by the argument kc=0), is a multivariate BOS model with the assumtion that, conditional on the class of the observations, the features are independent. The second model is a parsimonious version of the first model. Parsimony is introduced by grouping the features into clusters (as in co-clustering) and assuming that the features of a cluster have a common distribution.

1 2	bosclassif(x, y, idx_list=c(1), kr, kc=0, init, nbSEM, nbSEMburn, nbindmini, m=0, percentRandomB=0)

`x`	Matrix made of ordinal data of dimension N*Jtot. The features with same numbers of levels must be placed side by side. The missing values should be coded as NA.
`y`	Vector of length N. It should represent the classes corresponding to each row of x. Must be labeled with numbers (1,2,...,kr).
`idx_list`	Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begin in matrix x.
`kr`	Number of row classes.
`kc`	Vector of length D. The d^th element indicates the number of column clusters. Set to 0 to choose a classical multivariate BOS model.
`m`	Vector of length D. The d^th element defines the number of levels of the ordinal data.
`nbSEM`	Number of SEM-Gibbs iterations realized to estimate parameters.
`nbSEMburn`	Number of SEM-Gibbs burn-in iterations for estimating parameters. This parameter must be inferior to nbSEM.
`nbindmini`	Minimum number of cells belonging to a block.
`init`	String that indicates the kind of initialisation. Must be one of the following strings: "kmeans", "random" or "randomBurnin".
`percentRandomB`	Vector of length 1. Indicates the percentage of resampling when init is equal to "randomBurnin".

Return an object. The slots are:

`@zr`	Vector of length N with resulting row partitions.
`@zc`	List of length D. The d^th item is a vector of length J[d] representing the column partitions for the group of variables d.
`@J`	Vector of length D. The d^th item represents the number of columns for d^th group of variables.
`@W`	List of length D. Item d is a matrix of dimension J*kc[d] such that W[j,h]=1 if j belongs to cluster h.
`@V`	Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g.
`@icl`	ICL value for co-clustering.
`@kr`	Number of row classes.
`@name`	Name of the result.
`@number_distrib`	Number of groups of variables.
`@pi`	Vector of length kr. Row mixing proportions.
`@rho`	List of length D. The d^th item represents the column mixing proportion for the d^th group of variables.
`@dlist`	List of length d. The d^th item represents the indexes of group of variables d.
`@kc`	Vector of length D. The d^th element represents the number of clusters column H for the d^th group of variables.
`@m`	Vector of length D. The d^th element represents the number of levels of the d^th group of variables.
`@nbSEM`	Number of SEM-Gibbs algorithm iteration.
`@params`	List of length D. The d^th item represents the blocks parameters for a group of variables d.
`@xhat`	List of length D. The d^th item represents the dataset of the d^th group of variables, with missing values completed.

Margot Selosse, Julien Jacques, Christophe Biernacki.

# loading the real dataset
data("dataqol.classif")

set.seed(5)

# loading the ordinal data
M <- as.matrix(dataqol.classif[,2:29])


# creating the classes values
y <- as.vector(dataqol.classif$death)


# sampling datasets for training and to predict
nb.sample <- ceiling(nrow(M)*2/3)
sample.train <- sample(1:nrow(M), nb.sample, replace=FALSE)

M.train <- M[sample.train,]
M.validation <- M[-sample.train,]
nb.missing.validation <- length(which(M.validation==0))
m <- c(4)
M.validation[which(M.validation==0)] <- sample(1:m, nb.missing.validation,replace=TRUE)


y.train <- y[sample.train]
y.validation <- y[-sample.train]



# configuration for SEM algorithm
nbSEM=50
nbSEMburn=40
nbindmini=1
init="kmeans"

# number of classes to predict
kr <- 2
# different kc to test with cross-validation
kcol <- 1


res <- bosclassif(x=M.train,y=y.train,kr=kr,kc=kcol,m=m,
                  nbSEM=nbSEM,nbSEMburn=nbSEMburn,
                  nbindmini=nbindmini,init=init)

predictions <- predict(res, M.validation)