gaussianMixture: Multivariate Gaussian Mixture Model (GMM)
In danzafar/tidyspark: A Tidy Interface to Spark

Description Usage Arguments Value Note See Also Examples

Fits multivariate gaussian mixture model against a spark_tbl, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write_ml/read_ml to save/load fitted models.

ml_gaussian_mixture(data, formula, k = 2, maxIter = 100, tol = 0.01)

## S4 method for signature 'GaussianMixtureModel'
summary(object)

## S4 method for signature 'GaussianMixtureModel,character'
write_ml(object, path, overwrite = FALSE)

`data`	a spark_tbl for training.
`formula`	a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in ml_gaussianMixture.
`k`	number of independent Gaussians in the mixture model.
`maxIter`	maximum iteration number.
`tol`	the convergence tolerance.
`object`	a fitted gaussian mixture model.
`path`	the directory where the model is saved.
`overwrite`	overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.
`...`	additional arguments passed to the method.

ml_gaussian_mixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu), sigma (sigma), loglik (loglik), and posterior (posterior).

summary(GaussianMixtureModel) since 2.1.0

write_ml(GaussianMixtureModel, character) since 2.1.0

mixtools: https://cran.r-project.org/package=mixtools

## Not run: 
spark_session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- spark_tbl(as.data.frame(data))
model <- ml_gaussian_mixture(df, ~ V1 + V2, k = 2)
summary(model)

## End(Not run)