Description Usage Arguments Value Note Examples
View source: R/ml_clustering.R
Fits a k-means clustering model against a spark_tbl, similarly to R's
kmeans(). Users can call summary
to print a summary of the fitted
model, predict
to make predictions on new data, and write_ml
/
read_ml
to save/load fitted models.
Get fitted result from a k-means model, similarly to R's fitted(). Note: A saved-loaded model does not support this method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ml_kmeans(
data,
formula,
k = 2,
maxIter = 20,
initMode = c("k-means||", "random"),
seed = NULL,
initSteps = 2,
tol = 1e-04
)
## S4 method for signature 'KMeansModel'
summary(object)
## S4 method for signature 'KMeansModel'
fitted(object, method = c("centers", "classes"))
## S4 method for signature 'KMeansModel,character'
write_ml(object, path, overwrite = FALSE)
|
data |
a spark_tbl for training. |
formula |
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. Note that the response variable of formula is empty in ml_kmeans. |
k |
number of centers. |
maxIter |
maximum iteration number. |
initMode |
the initialization algorithm chosen to fit the model. |
seed |
the random seed for cluster initialization. |
initSteps |
the number of steps for the k-means|| initialization mode. This is an advanced setting, the default of 2 is almost always enough. Must be > 0. |
tol |
convergence tolerance of iterations. |
object |
a fitted k-means model. |
method |
type of fitted results, |
path |
the directory where the model is saved. |
overwrite |
overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
... |
additional argument(s) passed to the method. |
ml_kmeans
returns a fitted k-means model.
summary
returns summary information of the fitted model, which is a list.
The list includes the model's k
(the configured number of cluster centers),
coefficients
(model cluster centers),
size
(number of data points in each cluster), cluster
(cluster centers of the transformed data), is.loaded (whether the model is loaded
from a saved file), and clusterSize
(the actual number of cluster centers. When using initMode = "random",
clusterSize
may not equal to k
).
fitted
returns a spark_tbl containing fitted values.
summary(KMeansModel) since 2.0.0
write_ml(KMeansModel, character) since 2.0.0
1 2 3 4 5 6 7 8 | ## Not run:
spark_session()
t <- as.data.frame(Titanic)
df <- spark_tbl(t)
model <- ml_kmeans(df, Class ~ Survived, k = 4, initMode = "random")
summary(model)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.