# Clustering longitudinal data

### Description

'glmClust' cluster longitudinal data (trajectories) using the likelihood as a metric of distance, it also deals with multiples covariates with different effects using the generalised linear model 'glm'.

### Usage

1 2 3 4 |

### Arguments

`formula` |
A symbolic description of the model. In the parametric case we write for example 'y ~ clust(time+time2) + pop(sex)', here 'time' and 'time2' will have a different effect according to the cluster, the 'sex' effect is the same for all the clusters. In the non-parametric case only one covariate is allowed. |

`data` |
A [data.frame] in long format (no missing values) which means that each line corresponds to one measure of the observed phenomenon, and one individual may have multiple measures (lines) identified by an identity column. In the non-parametric case the totality of patients must have all the measurements at fixed times. |

`nClust` |
The number of clusters, between 2 and 26. |

`ident` |
Name of the column identity in the data. |

`timeVar` |
Name of the 'time' column in the data. |

`family` |
A description of the error distribution and link function to be used in the model, by default 'gaussian'. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for more details of family functions). |

`effectVar` |
Name of the effect specified or not in the formula is has level cluster effect or not (optional), note that this parameter is useful for the function plot |

`weights` |
Vector of 'prior weights' to be used in the fitting process, by default the weights are equal to one. |

`affUser` |
Initial affectation of the individuals in a [data.frame] format, if missing the individuals are randomly assigned to the clusters so it is optional . |

`timeParametric` |
By default [TRUE] thus parametric on the time. If [FALSE] then only one covariate is allowed in the formula and the algorithm used is the k-means. |

`separateSampling` |
By default [TRUE] it means that
the proportions of the clusters are supposed equal in the
classification step, the log-likelihood maximised at each
step of the algorithm is , otherwise the proportions
of clusters are taken into account and the log-likelihood
is |

`max_itr` |
The maximum number of iterations fixed at 100. |

`verbose` |
Print the output in the console. |

### Details

'glmClust' implements an ECM (esperance classification
maximisation) type algorithm which assigns the
trajectories to the cluster maximising the likelihood.
The procedure is repeated until no change in the
partitions or no sufficient increase in the likelihood is
possible.

'glmClust' also deals with multiple covariates with
different level effects, different in each cluster and/or
identical for all of them.

The introduction of covariates is possible thanks to
'glm' which fits a generalised linear model and take into
account the type of the response (normal, binomial,
Poisson ...etc) and the link function.

Several parameters of 'glmClust' are in common with
'glm', like the `formula`

which requires a
particular attention by specifying the covariates with a
cluster effect, for e.g. `clust(T1+T2+..+Tn)`

, the
covariates with an identical effect in each cluster are
specified with the keyword **pop**, for e.g.
`pop(X1+X2+..+Xn)`

, note that these last covariates
are optional.

The data are in the long format and no
missing values are allowed.

In the parametric case (`timeParametric = TRUE`

)
multiples covariates are allowed, in the non-parametric
case only one covariate is allowed.

The algorithm depends greatly on the starting condition, which is obtained by randomly affecting the trajectories to the clusters unless the user introduce his own partition. To obtain better results it is desirable to run the algorithm several times from different starting points, therefore it is preferable to use kmlCov which runs the algorithm several times with different number of clusters.

At the end of the algorithm, an object of class
GlmCluster is returned and contains
information about the affectation of the trajectories,
the proportions, the convergence, ...etc. The main
trajectories can be simply visualised by
`plot(my_GlmCluster_Object)`

.

### Value

An object of class GlmCluster.

### See Also

kmlCov

### Examples

1 2 3 4 5 6 7 | ```
data(artifdata)
res <- glmClust(formula = Y ~ clust(time + time2 + time3) + pop(treatTime),
data = artifdata, ident = 'id', timeVar = 'time', effectVar = 'treatment', nClust = 4)
# the trajectories with indices 0 indicate the ones with a normal treatment, 1 indicate a high dose
# the color indicates the clusters
# the proportions are in the table above the diagram
plot(res)
``` |