TGL_kmeans_tidy | R Documentation |
TGL kmeans with 'tidy' output
TGL_kmeans_tidy(
df,
k,
metric = "euclid",
max_iter = 40,
min_delta = 0.0001,
verbose = FALSE,
keep_log = FALSE,
id_column = FALSE,
reorder_func = "hclust",
add_to_data = FALSE,
hclust_intra_clusters = FALSE,
seed = NULL,
parallel = getOption("tglkmeans.parallel"),
use_cpp_random = FALSE
)
df |
a data frame or a matrix. Each row is a single observation and each column is a dimension. the first column can contain id for each observation (if id_column is TRUE), otherwise the rownames are used. |
k |
number of clusters. Note that in some cases the algorithm might return less clusters than k. |
metric |
distance metric for kmeans++ seeding. can be 'euclid', 'pearson' or 'spearman' |
max_iter |
maximal number of iterations |
min_delta |
minimal change in assignments (fraction out of all observations) to continue iterating |
verbose |
display algorithm messages |
keep_log |
keep algorithm messages in 'log' field |
id_column |
|
reorder_func |
function to reorder the clusters. operates on each center and orders by the result. e.g. |
add_to_data |
return also the original data frame with an extra 'clust' column with the cluster ids ('id' is the first column) |
hclust_intra_clusters |
run hierarchical clustering within each cluster and return an ordering of the observations. |
seed |
seed for the c++ random number generator |
parallel |
cluster every cluster parallelly (if hclust_intra_clusters is true) |
use_cpp_random |
use c++ random number generator instead of R's. This should be used for only for backwards compatibility, as from version 0.4.0 onwards the default random number generator was changed o R. |
list with the following components:
tibble with 'id' column with the observation id ('1:n' if no id column was supplied), and 'clust' column with the observation assigned cluster.
tibble with 'clust' column and the cluster centers.
tibble with 'clust' column and 'n' column with the number of points in each cluster.
tibble with 'clust' column the original data frame.
messages from the algorithm run (only if id_column = FALSE
).
tibble with 'id' column, 'clust' column, 'order' column with a new ordering if the observations and 'intra_clust_order' column with the order within each cluster. (only if hclust_intra_clusters = TRUE)
TGL_kmeans
# create 5 clusters normally distributed around 1:5
d <- simulate_data(
n = 100,
sd = 0.3,
nclust = 5,
dims = 2,
add_true_clust = FALSE,
id_column = FALSE
)
head(d)
# cluster
km <- TGL_kmeans_tidy(d, k = 5, "euclid", verbose = TRUE)
km
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.