| mt_cluster_k | R Documentation |
Estimates the optimal number of clusters (k) using various methods.
mt_cluster_k(
data,
use = "ln_trajectories",
dimensions = c("xpos", "ypos"),
kseq = 2:15,
compute = c("stability", "gap", "jump", "slope"),
method = "hclust",
weights = rep(1, length(dimensions)),
pointwise = TRUE,
minkowski_p = 2,
hclust_method = "ward.D",
kmeans_nstart = 10,
n_bootstrap = 10,
model_based = FALSE,
n_gap = 10,
na_rm = FALSE,
verbose = FALSE
)
data |
a mousetrap data object created using one of the mt_import
functions (see mt_example for details). Alternatively, a trajectory
array can be provided directly (in this case |
use |
a character string specifying which trajectory data should be used. |
dimensions |
a character vector specifying which trajectory variables should be used. Can be of length 2 or 3, for two-dimensional or three-dimensional trajectories respectively. |
kseq |
a numeric vector specifying set of candidates for k. Defaults to
2:15, implying that all values of k within that range are compared using
the metrics specified in |
compute |
character vector specifying the to be computed measures. Can
be any subset of |
method |
character string specifying the type of clustering procedure
for the stability-based method. Either |
weights |
numeric vector specifying the relative importance of the
variables specified in |
pointwise |
boolean specifying the way in which dissimilarity between
the trajectories is measured. If |
minkowski_p |
an integer specifying the distance metric for the cluster
solution. |
hclust_method |
character string specifying the linkage criterion used.
Passed on to the |
kmeans_nstart |
integer specifying the number of reruns of the kmeans
procedure. Larger numbers minimize the risk of finding local minima. Passed
on to the |
n_bootstrap |
an integer specifying the number of bootstrap comparisons
used by |
model_based |
boolean specifying whether the model-based or the
model-free should be used by |
n_gap |
integer specifying the number of simulated datasets used by
|
na_rm |
logical specifying whether trajectory points containing NAs should be removed. Removal is done column-wise. That is, if any trajectory has a missing value at, e.g., the 10th recorded position, the 10th position is removed for all trajectories. This is necessary to compute distance between trajectories. |
verbose |
logical indicating whether function should report its progress. |
mt_cluster_k estimates the number of clusters (k) using four
commonly used k-selection methods (specified via compute): cluster
stability (stability), the gap statistic (gap), the jump
statistic (jump), and the slope statistic (slope).
Cluster stability methods select k as the number of clusters for which
the assignment of objects to clusters is most stable across bootstrap
samples. This function implements the model-based and model-free methods
described by Haslbeck & Wulff (2020). See references.
The remaining three methods select k as the value that optimizes the
gap statistic (Tibshirani, Walther, & Hastie, 2001), the jump statistic
(Sugar & James, 2013), and the slope statistic (Fujita, Takahashi, &
Patriota, 2014), respectively.
For clustering trajectories, it is often useful that the endpoints of all trajectories share the same direction, e.g., that all trajectories end in the top-left corner of the coordinate system (mt_remap_symmetric or mt_align can be used to achieve this). Furthermore, it is recommended to use length normalized trajectories (see mt_length_normalize; Wulff et al., 2019).
A list containing two lists that store the results of the different
methods. kopt contains the estimated k for each of the
methods specified in compute. paths contains the values for
each k in kseq as computed by each of the methods specified
in compute. The values in kopt are optima for each of the
vectors in paths.
Dirk U. Wulff
Jonas M. B. Haslbeck
Haslbeck, J. M. B., & Wulff, D. U. (2020). Estimating the Number of Clusters via a Corrected Clustering Instability. Computational Statistics, 35, 1879–1894.
Wulff, D. U., Haslbeck, J. M. B., Kieslich, P. J., Henninger, F., & Schulte-Mecklenbeck, M. (2019). Mouse-tracking: Detecting types in movement trajectories. In M. Schulte-Mecklenbeck, A. Kühberger, & J. G. Johnson (Eds.), A Handbook of Process Tracing Methods (pp. 131-145). New York, NY: Routledge.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.
Sugar, C. A., & James, G. M. (2013). Finding the number of clusters in a dataset. Journal of the American Statistical Association, 98(463), 750-763.
Fujita, A., Takahashi, D. Y., & Patriota, A. G. (2014). A non-parametric method to estimate the number of clusters. Computational Statistics & Data Analysis, 73, 27-39.
mt_distmat for more information about how the distance matrix is computed when the hclust method is used.
mt_cluster for performing trajectory clustering with a specified number of clusters.
## Not run:
# Length normalize trajectories
KH2017 <- mt_length_normalize(KH2017)
# Find k
results <- mt_cluster_k(KH2017, use="ln_trajectories")
# Retrieve results
results$kopt
results$paths
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.