README.md

CRAN Status Badge CRAN Downloads CRAN Monthly Downloads DOI DOI

mixR: An R package for finite mixture modeling for both raw and binned data

Why mixR?

R programming language provides a rich collection of packages for building and analyzing finite mixture models which are widely used in unsupervised learning such as model-based clustering and density estimation. For example, - mclust can be used to build Gaussian mixture models with different covariance structures - mixtools implements parametric and non-parametric mixture models as well as mixtures of Gaussian regressions - flexmix provides a general framework for finite mixtures of regression models - mixdist fits mixture models for grouped and conditional data (also called binned data).

To our knowledge, almost all R packages for finite mixture models are designed to use raw data as the modeling input except mixdist. However the popular model selection methods based on information criteria or bootstrapping likelihood ratio test (McLachlan, 1987; Feng & McCulloch, 1996; Yu & Harvill, 2019) are not implemented in mixdist.

mixR is a package that aims to bridge this gap and to unify the interface for finite mixture modeling for both raw and binned data.

Installation

For stable/pre-compiled(for Windows and OS X) version, please install from CRAN:

install.packages('mixR')

To get the latest development version from Github:

# install.packages('devtools')
devtools::install_github('garybaylor/mixR')

Examples

library(mixR)

# generate data from a Normal mixture model
set.seed(102)
x1 = rmixnormal(1000, c(0.3, 0.7), c(-2, 3), c(2, 1))

# fit a Normal mixture model
mod1 = mixfit(x1, ncomp = 2)

# plot the fitted model
plot(mod1)

# fit a Normal mixture model (equal variance)
mod1_ev = mixfit(x1, ncomp = 2, ev = TRUE)
# generate data from a Weibull mixture model
x2 = rmixweibull(1000, c(0.4, 0.6), c(0.6, 1.3), c(0.1, 0.1))
mod2_weibull = mixfit(x2, family = 'weibull', ncomp = 2)
head(Stamp2)
##     lower  upper freq
## 1  0.0595 0.0605    1
## 5  0.0635 0.0645    2
## 6  0.0645 0.0655    1
## 7  0.0655 0.0665    1
## 9  0.0675 0.0685    1
## 10 0.0685 0.0695    7
mod_binned = mixfit(Stamp2, ncomp = 7, family = 'weibull')
plot(mod_binned)

# data binned from numeric data
x1_binned = bin(x1, seq(min(x1), max(x1), length = 30))
mod1_binned = mixfit(x1_binned, ncomp = 2)
# Selecting the best g for Normal mixture model
s_normal = select(x2, ncomp = 2:6)

# Selecting the best g for Weibull mixture model
s_weibull = select(x2, ncomp = 2:6, family = 'weibull')

plot(s_weibull)
plot(s_normal)
b1 = bs.test(x1, ncomp = c(2, 3))
plot(b1, main = 'Bootstrap LRT for Normal Mixture Models (g = 2 vs g = 3)')
b1$pvalue

b2 = bs.test(x2, ncomp = c(2, 4))
plot(b2, main = 'Bootstrap LRT for Normal Mixture Models (g = 2 vs g = 4)')
b2$pvalue

For more examples please check the vignette An Introduction to mixR.

Contributor Code of Conduct

Everyone is welcome to contribute to the project through reporting issues, posting feature requests, updating documentation, submitting pull requests, or contact the project maintainer directly. To maintain a friendly atmosphere and to collaborate in a fun and productive way, we expect contributors to abide by the Contributor Code of Conduct.

Citation

Yu, Y., (2022). mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data. Journal of Open Source Software, 7(69), 4031, https://doi.org/10.21105/joss.04031

BibTex information

@article{Yu2022,
  doi = {10.21105/joss.04031},
  url = {https://doi.org/10.21105/joss.04031},
  year = {2022},
  publisher = {The Open Journal},
  volume = {7},
  number = {69},
  pages = {4031},
  author = {Youjiao Yu},
  title = {mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data},
  journal = {Journal of Open Source Software}
}


GaryBAYLOR/mixR documentation built on Oct. 14, 2024, 11:34 p.m.