kml3d is a new implementation of k-means for joint longitudinal
data (or joint trajectories). This algorithm is able to deal with missing value and
provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for.
Here is the description of the algorithm. For an overview of the package, see kml3d-package.
[ClusterLongData3d]: contains trajectories to clusterize
[vector(numeric)]: Vector containing the number of clusters
[numeric]: Sets the number of time that k-means must be re-run (with different starting conditions) for each number of clusters.
kml3d works on object of class
For each number
i included in
kml3d computes a
Partition then stores it in the field
cX of the object
ClusterLongData according to its number
of clusters 'X'.
The algorithm starts over as many times as it is told in
nbRedrawing. By default, it is executed for 2,
3, 4, 5 and 6 clusters 20 times each, namely 100 times.
Partition has been found, it is added to the slot
c1, c2, c3, ... or c26.
cX stores the all
X clusters. Inside a sublist, the
Partition are sorted from the biggest quality criterion to
the smallest (the best are stored first, using
ordered,ListPartition), or not.
Partition are saved throughout the algorithm. If the user
interrupts the execution of
kml3d, the result is not lost. If the
kml3d on an object, then running
kml3d again on
the same object will add some new
Partition to the one already
The possible starting conditions are defined in
ClusterLongData3d object, after having added
Partition to it.
kml3d, there are two different procedures :
Fast: when the parameter
distance is set to "euclidean3d"
toPlot is set to 'none' or
kml3d call a C
compiled (optimized) procedure.
Slow: when the user defines its own distance or if he wants
to see the construction of the clusters by setting
'traj' or 'both',
kml3d uses a R non compiled
The C prodecure is 25 times faster than the R one.
So we advice to use the R procedure 1/ for trying some new method
(like using a new distance) or 2/ to "see" the very first clusters
construction, in order to check that every thing goes right. Then it
is better to
switch to the C procedure (like we do in
If for a specific use, you need a different distance, feel free to contact the author.
1 2 3 4 5 6 7 8 9 10 11
### Generation of some data cld1 <- generateArtificialLongData3d(15) ### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing. ### We want to "see" what happen (so toPlot="both") kml3d(cld1,2:5,3,toPlot="both") ### 3 seems to be the best. ### We don't want to see again, we want to get the result as fast as possible. ### Just, to check the overall process, we plot the criterion evolution kml3d(cld1,3,10,toPlot="criterion")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.