kml3d: ~ Algorithm kml3d: K-means for Joint Longitidinal data ~
In kml3d: K-Means for Joint Longitudinal Data

kml3d

R Documentation

~ Algorithm kml3d: K-means for Joint Longitidinal data ~

Description

kml3d is a new implementation of k-means for joint longitudinal data (or joint trajectories). This algorithm is able to deal with missing value and provides an easy way to re roll the algorithm several times, varying the starting conditions and/or the number of clusters looked for.

Here is the description of the algorithm. For an overview of the package, see kml3d-package.

Usage

kml3d(object, nbClusters = 2:6, nbRedrawing = 20, toPlot = "none",
    parAlgo = parKml3d())

Arguments

`object`	[ClusterLongData3d]: contains trajectories to clusterize and some `Partition` (see package `longitudinalData`).
`nbClusters`	[vector(numeric)]: Vector containing the number of clusters with which `kml3d` must work. By default, `nbClusters` is `2:6` which indicates that `kml3d` must search partitions with respectively 2, then 3, ... up to 6 clusters. Maximum number of cluster is 26.
`nbRedrawing`	[numeric]: Sets the number of time that k-means must be re-run (with different starting conditions) for each number of clusters.
`toPlot`	`[character]`: during computation, `kml3d` can display some graphes. If `toPlot="traj"`, then the trajectories are plot (like with function `plot, ClusterLongData`). If `toPlot="criterion"`, the quality criterions are plot (like with function `plotCriterion`). If `toPlot="both"`, the graphic windows is split in two and both graphs are displayed. If "none", there is no graphical display.
`parAlgo`	`[ParKml]` (in package `kml`): set the option used by `kml3d` (like the starting condition, the imputation methods, the save frequency, the maximum number of iteration, , the distance used...) See `ParKml` in package `kml` for details. The default values are described in `parKml3d`.

Details

kml3d works on object of class ClusterLongData. For each number i included in nbClusters, kml3d computes a Partition then stores it in the field cX of the object ClusterLongData according to its number of clusters 'X'. The algorithm starts over as many times as it is told in nbRedrawing. By default, it is executed for 2, 3, 4, 5 and 6 clusters 20 times each, namely 100 times.

When a Partition has been found, it is added to the slot c1, c2, c3, ... or c26. cX stores the all Partition with X clusters. Inside a sublist, the Partition are sorted from the biggest quality criterion to the smallest (the best are stored first, using ordered,ListPartition), or not.

Note that Partition are saved throughout the algorithm. If the user interrupts the execution of kml3d, the result is not lost. If the user run kml3d on an object, then running kml3d again on the same object will add some new Partition to the one already found.

The possible starting conditions are defined in initializePartition.

Value

A ClusterLongData3d object, after having added some Partition to it.

Optimisation

Behind kml3d, there are two different procedures :

Fast: when the parameter distance is set to "euclidean3d" and toPlot is set to 'none' or 'criterion', kml3d call a C compiled (optimized) procedure.
Slow: when the user defines its own distance or if he wants to see the construction of the clusters by setting toPlot to 'traj' or 'both', kml3d uses a R non compiled programmes.

The C prodecure is 25 times faster than the R one.

So we advice to use the R procedure 1/ for trying some new method (like using a new distance) or 2/ to "see" the very first clusters construction, in order to check that every thing goes right. Then it is better to switch to the C procedure (like we do in Example section).

If for a specific use, you need a different distance, feel free to contact the author.

Examples


### Move to tempdir
wd <- getwd()
setwd(tempdir()); getwd()
  
### Generation of some data
cld1 <- generateArtificialLongData3d(15)

### We suspect 2, 3, 4 or 5 clusters, we want 3 redrawing.
###   We want to "see" what happen (so toPlot="both")
kml3d(cld1,2:5,3,toPlot="both")

### 3 seems to be the best.
###   We don't want to see again, we want to get the result as fast as possible.
###   Just, to check the overall process, we plot the criterion evolution
kml3d(cld1,3,10,toPlot="criterion")

### Go back to current dir
setwd(wd)

kml3d documentation built on Oct. 30, 2024, 9:12 a.m.