kml-package | R Documentation |
This package is a implementation of k-means for longitudinal data (or trajectories).
Here is an overview of the package. For the description of the
algorithm, see kml
.
Package: | kml |
Type: | Package |
Version: | 2.4.1 |
Date: | 2016-02-02 |
License: | GPL (>= 2) |
LazyData: | yes |
Depends: | methods,clv,longitudinalData(>= 2.1.2) |
URL: | http://www.r-project.org |
URL: | http://christophe.genolini.free.fr/kml |
To cluster data, KmL
go through three steps, each of which
is associated to some functions:
Data preparation
Building "optimal" partition
Exporting results
KmL
works on object of class ClusterLongData
.
Data preparation therefore simply consists in transforming data into an object ClusterLongData
.
This can be done via function
clusterLongData
(cld
in short).
It converts a data.frame
or a matrix
into a ClusterLongData
.
Instead of working on real data, one can also work on artificial
data. Such data can be created with
generateArtificialLongData
(gald
in
short).
Once an object of class ClusterLongData
has been created, the algorithm
kml
can be run.
Starting with a ClusterLongData
, kml
built a
Partition
, a class in package longitudinalData.
An object of class Partition
is a partition of trajectories
into subgroups. It also contains some information like the
percentage of trajectories contained in each group or some quality critetion.
kml
is a "hill-climbing" algorithm. The specificity of this
kind of algorithm is that it always converges towards a maximum, but
one cannot know whether it is a local or a global maximum. It offers
no guarantee of optimality.
To maximize one's chances of getting a quality Partition
, it is better to run the hill climbing algorithm several times,
then to choose the best solution. By default, kml
executes the hill climbing algorithm 20 times
and chooses the Partition
maximizing the determinant of the matrix between.
Likewise, it is not possible to know beforehand the optimum number of clusters.
On the other hand, afterwards, it is possible to calculate
clues that will enable us to choose.
In the end, kml
tests by default 2, 3, 4, 5 et 6 clusters, 20 times each.
When kml
has constructed some
Partition
, the user can examine them one by one and choose
to export some. This can be done via function
choice
. choice
opens a graphic windows showing
various information including the trajectories clutered by a specific
Partition
.
When some Partition
has been selected (the user can select
more than 1), it is possible to
save them. The clusters are therefore exported towards the file
name-cluster.csv
. Criteria are exported towards
name-criteres.csv
. The graphs are exported according to their
extension.
It is also possible to extract a partition from the object
ClusterLongData
using the function getClusters
.
Classes : ClusterLongData
,
Partition
in package longitudinalData
Methods : clusterLongData
, kml
, choice
Plot : plot(ClusterLongData)
### Move to tempdir
wd <- getwd()
setwd(tempdir()); getwd()
### 1. Data Preparation
data(epipageShort)
names(epipageShort)
cldSDQ <- cld(epipageShort,timeInData=3:6,time=c(3,4,5,8))
### 2. Building "optimal" clusteration (with only 3 redrawings)
kml(cldSDQ,nbRedrawing=3,toPlot="both")
### 3. Exporting results
### To check the best's cluster numbers
plotAllCriterion(cldSDQ)
# To see the best partition
try(choice(cldSDQ))
### 4. Further analysis
epipageShort$clust <- getClusters(cldSDQ,4)
summary(glm(gender~clust,data=epipageShort,family="binomial"))
### Go back to current dir
setwd(wd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.