ds.kmeans | R Documentation |
Performs a k-means clustering on a distributed table using euclidean distance
ds.kmeans( x, k = NULL, convergence = 0.001, max.iter = 100, centroids = NULL, assign = TRUE, name = NULL, datasources = NULL )
x |
|
k |
|
convergence |
|
max.iter |
|
centroids |
|
assign |
|
name |
|
datasources |
a list of |
This implementation of the kmeans is basically a parallel kmeans where each server acts as a thread. It can be applied because the results that are passed to the master (client) are not disclosive since they are aggregated values that cannot be traced backwards. The assignations vector is not disclosive since all the information that can be extracted from it is the same given by the ds.summary function. For more information on the implementation please refer to 'Parallel K-Means Clustering Algorithm on DNA Dataset' by Fazilah Othman, RosniAbdullah, Nur’Aini Abdul Rashid and Rosalina Abdul Salam
data frame
Where:
-Each column corresponds to a centroid (1:k)
-Each row corresponds to the a variable of the server data frame
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.