# qualityCriterion: ~ Function: qualityCriterion ~ In longitudinalData: Longitudinal Data

## Description

Given a `LongData` and a `Partition`, the fonction `qualityCriterion` calculate some qualities criterion.

## Usage

 `1` ```qualityCriterion(traj,clusters,imputationMethod="copyMean") ```

## Arguments

 `traj` `[LongData]` or `[matrix]`: object containing the trajectories on which the criterion is calculate. `clusters` `[Paritition]` or `[vector(integer)]`: clusters to which individual belongs. `imputationMethod` `[character]`: if some value are missing in the `LongData`, it is necessary to impute them. Then the function `qualityCriterion` call the function `imputation` using the method `method`.

## Details

Given a `LongData` and a `Partition` (or a `matrix` and a vector of `integer`), the fonction `qualityCriterion` calculate several quality criterion and return then as a list (see 'value' below).

If some individual have no clusters (ie if `Partition` has some missing values), the corresponding trajectories are exclude from the calculation.

Note that if there is an empty cluster or an empty trajectory, most of the criterions are anavailable.

Basicaly, 6 non-parametrics criterions are computed. In addition, ASSUMING THAT in each clusters C and for each time T, the variable follow a NORMAL LAW (mean and standard deviation of the variable at time T restricted to clusters C), it is possible to compute the the posterior probabilities of the individual trajectories and the likelihood. From there, we can also compute the BIC, the AIC and the global posterior probability. The function `qualityCriterion` also compute these criterion. But the user should alway keep in mind that these criterion are valid ONLY under the hypothesis of normality. If this hypoth<e8>sis is not respected, algorithm like k-means will converge but the BIC and AIC will have no meaning.

IMPORTANT NOTE: Some criterion should be maximized, some other should be minimized. This might be confusing for the non expert. In order to simplify the comparison of the criterion, `qualityCriterion` compute the OPPOSITE of the criterion that should be minimized (Ray & Bouldin, Davies & Turi, BIC and AIC). Thus, all the criterion computed by this function should be maximized.

## Value

A list with three fields: the first is the list of the criterions. the second is the clusters post probabilities; the third is the matrix of the individual post probabilities.

## Non-parametric criterion

Notations: k=number of clusters; n=number of individual; B=Between variance ; W=Within variance The criterion are:

• Calinski.Harabatz`[numeric]`: Calinski and Harabatz criterion: `c(k)=Trace(B)/Trace(W)*(n-k)/(k-1)`.

• Calinski.Harabatz2`[numeric]`: Calinski and Harabatz criterion modified by Krysczuk: `c(k)=Trace(B)/Trace(W)*(n-1)/(n-k)`.

• Calinski.Harabatz3`[numeric]`: Calinski and Harabatz criterion modified by Genolini: `g(k)=Trace(B)/Trace(W)*(n-k)/sqrt(k-1)`.

• Ray.Turi`[numeric]`: Ray and Turi criterion: `r(k)=-Vintra/Vinter` with `Vintra=Sum(dist(x,center(x)))` and `Vinter=min(dist(center_i,center_j)^2)`. (The "true" index of Ray and Turi is `Vintra/Vinter` and should me minimized. See IMPORTANT NOTE above.)

• Davies.Bouldin`[numeric]`: Davies and Bouldin criterion: `d(k)=-mean(Proximite(cluster_i,cluster_j))` with `Proximite(i,j)=(DistInterne(i)+DistInterne(j))/(DistExterne(i,j))`. (The "true" index of Davies and Bouldin is `mean(Proximite())` and should me minimized. See IMPORTANT NOTE above.)

• random`[numeric]`: random value following the normal law N(0,1).

## Parametric criterion

All the parametric indices should be minimized. So the function `qualityCriterion` compute their opposite (see IMPORTANT NOTE above.)

Notation: L=likelihood; h=number of parameters; n=number of trajectories; t=number of time measurement; N=total number of measurement (N=t.n).

SECOND IMPORTANT NOTE: the formula of parametrics criterion ofen include the size of the population. In the specific case on longitudinal data, the definition of the "size of the population" is not obvious. It can be either the number of individual `n`, or the number of measurement `N=n.t`. So, the function `qualityCriterion` gives two version of all the non parametrics criterion, the first using `n`, the second using `N`.

• BIC`[numeric]`: Bayesian Information Criterion: BIC=2*log(L)-h*log(n). See IMPORTANT NOTE above.

• BIC2`[numeric]`: Bayesian Information Criterion: BIC=2*log(L)-h*log(N). See IMPORTANT NOTE above.

• AIC`[numeric]`: Akaike Information Criterion, bis: AIC=2*log(L)-2*h. See IMPORTANT NOTE above.

• AICc`[numeric]`: Akaike Information Criterion with correction: AIC=AIC+(2h(h+1))/(n-h-1). See IMPORTANT NOTE above.

• AICc2`[numeric]`: Akaike Information Criterion with correction, bis: AIC=AIC+(2h(h+1))/(n-h-1). See IMPORTANT NOTE above.

## Author

Christophe Genolini
1. UMR U1027, INSERM, Universit<e9> Paul Sabatier / Toulouse III / France
2. CeRSM, EA 2931, UFR STAPS, Universit<e9> de Paris Ouest-Nanterre-La D<e9>fense / Nanterre / France

## References

 C. Genolini and B. Falissard
"KmL: k-means for longitudinal data"
Computational Statistics, vol 25(2), pp 317-328, 2010

 C. Genolini and B. Falissard
"KmL: A package to cluster longitudinal data"
Computer Methods and Programs in Biomedicine, 104, pp e112-121, 2011

`LongData`, `Partition`, `imputation`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```################## ### Preparation of some artificial data par(ask=TRUE) data(artificialLongData) ld <- longData(artificialLongData) ### Correct partition part1 <- partition(rep(1:4,each=50)) plotTrajMeans(ld,part1) (cr1 <- qualityCriterion(ld,part1)) ### Random partition part2 <- partition(floor(runif(200,1,5))) plotTrajMeans(ld,part2) (cr2 <- qualityCriterion(ld,part2)) ### Partition with 3 clusters instead of 4 part3 <- partition(rep(c(1,2,3,3),each=50)) plotTrajMeans(ld,part3) (cr3 <- qualityCriterion(ld,part3)) ### Comparisons of the Partition plot(c(cr1[],cr2[],cr3[]),main="The highest give the best partition (according to Calinski & Harabatz criterion)") par(ask=FALSE) ```