clusterLongData: ~ Function: clusterLongData (or cld) ~
In kml: K-Means for Longitudinal Data

clusterLongData

R Documentation

~ Function: clusterLongData (or cld) ~

Description

clusterLongData (or cld in short) is the constructor for ClusterLongData object.

Usage

clusterLongData(traj, idAll, time, timeInData, varNames, maxNA)
cld(traj, idAll, time, timeInData, varNames, maxNA)

Arguments

`traj`	`[matrix(numeric)]` or `[data.frame]`: structure containning the trajectories. Each line is the trajectory of an individual. The columns refer to the time during which measures were made.
`idAll`	`[vector(character)]`: single identifier for each trajectory (ie each individual). Note that the identifiers are of type `character` (that allow to deal identifiers like `XUK32-612`, identifiers that our favorite epidemiologists are so good at providing). If `idAll` are `numeric`, they are converted into `characters`.
`time`	`[vector(numeric)]`: time at which measures were made.
`timeInData`	`[vector(numeric)]`: precise the column containing the trajectories.
`varNames`	`[character]`: name of the variable being measured.
`maxNA`	`[numeric]`: maximum number of NA that are tolerates on a trajectory. If a trajectory has more missing than `maxNA`, then it is remove from the analysis.

Details

clusterLongData construct a object of class ClusterLongData. Two cases can be distinguised:

traj is an array:

lines are individual. Column are time of measurment.

If idAll is missing, the individuals are labelled i1, i2, i3,...

If timeInData is missing, all the column are used (timeInData=1:ncol(traj)).

If traj is a data.frame:

lines are individual. Column are time of measurement.

If idAll is missing, then the first column of the data.frame is used for idAll

If timeInData is missing and idAll is missing, then all the columns but the first are used for timeInData (the first is omited since it is already used for idAll): idAll=traj[,1],timeInData=2:ncol(traj).

If timeInData is missing but idAll is not missing, then all the column including the first are used for timeInData: timeInData=1:ncol(traj).

Value

An object of class ClusterLongData.

Author

Christophe Genolini
1. UMR U1027, INSERM, Université Paul Sabatier / Toulouse III / France
2. CeRSME, EA 2931, UFR STAPS, Université de Paris Ouest-Nanterre-La Défense / Nanterre / France

References

[1] C. Genolini and B. Falissard
"KmL: k-means for longitudinal data"
Computational Statistics, vol 25(2), pp 317-328, 2010

[2] C. Genolini and B. Falissard
"KmL: A package to cluster longitudinal data"
Computer Methods and Programs in Biomedicine, 104, pp e112-121, 2011

Examples

#####################
### From matrix

### Small data
mat <- matrix(c(1,NA,3,2,3,6,1,8,10),3,3,dimnames=list(c(101,102,104),c("T2","T4","T8")))
clusterLongData(mat)
(ld1 <- clusterLongData(traj=mat,idAll=as.character(c(101,102,104)),time=c(2,4,8),varNames="V"))
plot(ld1)

### Big data
mat <- matrix(runif(1051*325),1051,325)
(ld2 <- clusterLongData(traj=mat,idAll=paste("I-",1:1051,sep=""),time=(1:325)+0.5,varNames="R"))

####################
### From data.frame

dn <- data.frame(id=1:3,v1=c(NA,2,1),v2=c(NA,1,0),v3=c(3,2,2),v4=c(4,2,NA))

### Basic
clusterLongData(dn)

### Selecting some times
(ld3 <- clusterLongData(dn,timeInData=c(1,2,4),varNames=c("Hyp")))

### Excluding trajectories with more than 1 NA
(ld3 <- clusterLongData(dn,maxNA=1))

kml documentation built on Oct. 30, 2024, 9:09 a.m.