RMixtComp-package: RMixtComp

RMixtComp-packageR Documentation

RMixtComp

Description

MixtComp (Mixture Composer, https://github.com/modal-inria/MixtComp) is a model-based clustering package for mixed data. It used mixture models (McLachlan and Peel, 2010) fitted using a SEM algorithm (Celeux et al., 1995) to cluster the data.

It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption.

Five basic models (Gaussian, Multinomial, Poisson, Weibull, NegativeBinomial) are implemented, as well as two advanced models: Func_CS for functional data (Same et al., 2011) and Rank_ISR for ranking data (Jacques and Biernacki, 2014).

MixtComp has the ability to natively manage missing data (completely or by interval).

Details

Main functions are mixtCompLearn for clustering, mixtCompPredict for predicting the cluster of new samples with a model learnt with mixtCompLearn. createAlgo gives you default values for required parameters.

Read the help page of mixtCompLearn for available models and data format. A summary of these information can be accessed with the function availableModels.

All utility functions (getters, graphical) are in the RMixtCompUtilities-package package.

In order to have an overview of the output, you can use print.MixtCompLearn, summary.MixtCompLearn and plot.MixtCompLearn functions,

Getters are available to easily access some results (see. mixtCompLearn for output format): getBIC, getICL, getCompletedData, getParam, getProportion, getTik, getEmpiricTik, getPartition, getType, getModel, getVarNames.

You can compute discriminative powers and similarities with functions: computeDiscrimPowerClass, computeDiscrimPowerVar, computeSimilarityClass, computeSimilarityVar.

Graphics functions are plot.MixtComp, plot.MixtCompLearn, heatmapClass, heatmapTikSorted, heatmapVar, histMisclassif, plotConvergence, plotDataBoxplot, plotDataCI, plotDiscrimClass, plotDiscrimVar, plotProportion, plotCrit.

Datasets with running examples are provided: titanic, CanadianWeather, prostate, simData.

Documentation about input and output format is available: vignette("dataFormat") and vignette("mixtCompObject").

MixtComp examples: vignette("MixtComp") or online https://github.com/vandaele/mixtcomp-notebook.

Using ClusVis with RMixtComp: vignette("dataFormat").

References

C. Biernacki. MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data. MISSDATA 2015, Jun 2015, Rennes, France. hal-01253393

G. McLachlan, D. Peel (2000). Finite Mixture Models. Wiley Series in Probability and Statistics, 1st edition. John Wiley & Sons. doi:10.1002/0471721182.

G. Celeux, D. Chauveau, J. Diebolt. On Stochastic Versions of the EM Algorithm. [Research Report] RR-2514, INRIA. 1995. inria-00074164

A. Same, F. Chamroukhi, G. Govaert, P. Aknin. (2011). Model-based clustering and segmentation of time series with change in regime. Adv. Data Analysis and Classification. 5. 301-321. 10.1007/s11634-011-0096-5.

J. Jacques, C. Biernacki. (2014). Model-based clustering for multivariate partial ranking data. Journal of Statistical Planning and Inference. 149. 10.1016/j.jspi.2014.02.011.

See Also

mixtCompLearn availableModels RMixtCompUtilities-package, RMixtCompIO-package. Other clustering packages: Rmixmod

Examples

data(simData)

# define the algorithm's parameters: you can use createAlgo function
algo <- list(
    nbBurnInIter = 50,
    nbIter = 50,
    nbGibbsBurnInIter = 50,
    nbGibbsIter = 50,
    nInitPerClass = 20,
    nSemTry = 20,
    confidenceLevel = 0.95
)

# run RMixtComp for learning using only 3 variables
resLearn <- mixtCompLearn(simData$dataLearn$matrix, simData$model$unsupervised[1:3], algo,
    nClass = 1:2, nRun = 2, nCore = 1
)

summary(resLearn)
plot(resLearn)

# run RMixtComp for predicting
resPred <- mixtCompPredict(
    simData$dataPredict$matrix, simData$model$unsupervised[1:3], algo,
    resLearn, nCore = 1
)

partitionPred <- getPartition(resPred)
print(resPred)


RMixtComp documentation built on July 9, 2023, 6:06 p.m.