gaussianMixture: Multivariate Gaussian Mixture Model (GMM)

Description Usage Arguments Value Note See Also Examples

Description

Fits multivariate gaussian mixture model against a spark_tbl, similarly to R's mvnormalmixEM(). Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write_ml/read_ml to save/load fitted models.

Usage

1
2
3
4
5
6
7
ml_gaussian_mixture(data, formula, k = 2, maxIter = 100, tol = 0.01)

## S4 method for signature 'GaussianMixtureModel'
summary(object)

## S4 method for signature 'GaussianMixtureModel,character'
write_ml(object, path, overwrite = FALSE)

Arguments

data

a spark_tbl for training.

formula

a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in ml_gaussianMixture.

k

number of independent Gaussians in the mixture model.

maxIter

maximum iteration number.

tol

the convergence tolerance.

object

a fitted gaussian mixture model.

path

the directory where the model is saved.

overwrite

overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

...

additional arguments passed to the method.

Value

ml_gaussian_mixture returns a fitted multivariate gaussian mixture model.

summary returns summary of the fitted model, which is a list. The list includes the model's lambda (lambda), mu (mu), sigma (sigma), loglik (loglik), and posterior (posterior).

Note

summary(GaussianMixtureModel) since 2.1.0

write_ml(GaussianMixtureModel, character) since 2.1.0

See Also

mixtools: https://cran.r-project.org/package=mixtools

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
spark_session()
library(mvtnorm)
set.seed(100)
a <- rmvnorm(4, c(0, 0))
b <- rmvnorm(6, c(3, 4))
data <- rbind(a, b)
df <- spark_tbl(as.data.frame(data))
model <- ml_gaussian_mixture(df, ~ V1 + V2, k = 2)
summary(model)

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.