Using $\texttt{traj}$ Package to Identify Clusters of Longitudinal Trajectories"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

$\texttt{Abstract}$

The $\texttt{traj}$ package implements the 3-step procedure proposed by Leffondre et al. (2004) to identify clusters of longitudinal trajectories. The first step calculates 24 summary measures that describes features of the trajectories. The second step performs a factor analysis on these 24 measures to select measures that best describenthe main features of the trajectories. The third step classifies the trajectories into clusters based on the previously selected factors. The $\texttt{traj}$ package also offers a wide variety of plotting function used to visualize the results.

This vignette illustrates the use of the $\texttt{traj}$ package using simulated data. A more detailed description of the methods can be found in Sylvestre et al. (2006) or Leffondre et al. (2004).

$\texttt{Data}$

Data consist in two dataframes. We only need the first one. The first dataframe, $\texttt{example.data\$data}$, contains the values for each individual trajectory. Each row correspond to a trajectory.

library(traj)
head(example.data$data)
head(example.data$time)

$\texttt{Analysis}$

The first step in the analysis consists of the computing 24 measures of each trajectory.

The 24 measures are:

The 24 measures can be computed using the step1measures function.

s1 = step1measures(example.data$data, ID = TRUE)
head(s1$measurments)

Each row in the dataframe returned by $\texttt{step1measures}$ corresponds to the trajectory on the same row in the input data ($\texttt{example.data\$data}$). For each trajectory, the 24 measures have been calculated and correspond to columns m1 to m24.

In the second step of the analysis, a factor analysis is performed to select a subset of measures that describes the main features of the trajectories. The function step2factors is used to perform the factor analysis.

s2 = step2factors(s1)
head(s2$factors)

In this example, the step2factors has identified measures 4, 5, 21 and 24 as the main factors of this set of trajectories. Measures 6, 13 and 18 were not considered because they were too correlated with other measures (measures with a correlation higher than $0.95$ are omitted from the factor analysis).

Once this step is done, the third step of the procedure consists in clustering the trajectories based on the measures identified in the factor analysis. This step is implemented in the step3clusters function. Two options are available to select the number of clusters. First, the user can a priori decide on the number of clusters, such as in the following example in which the number of clusters is set to 4.

s3 = step3clusters(s2, nclusters = 4)

Alternatively, the number of clusters can be left blank in which case the step3clusters function will rely on the $\texttt{NbClust}$ function from the $\texttt{NbClust}$ package to determine the optimal number of clusters based on one of the criteria available in $\texttt{NbClust}$. Please see $\texttt{NbClust}$ documentation for more details.

The function step3clusters assigns each trajectory to one and only one cluster and returns a dataframe that identifies cluster membership.

head(s3$clusters)
s3$clust.distr

The $\texttt{traj}$ object returned by the function step3clusters can be plotted by an array of plotting functions, as described in the next section.

$\texttt{Plotting the traj object}$

The $\texttt{traj}$ object created by $\texttt{step3clusters}$ can be plotted by an array of plotting functions.

plot(s3)

This function selects 10 random trajectories from each cluster and plots them using randomly selected colours. The user can specify the number of trajectories to plot, the colours or any other generic plotting parameter. The user can request that trajectories from only one cluster be plotted.

\newpage

The $\texttt{plotMeanTraj}$ function plots the mean trajectory of every cluster. The user can request that trajectories from only one cluster be plotted.

plotMeanTraj(s3)

\newpage

The $\texttt{plotMedTraj}$ function plots the median trajectory of every cluster with 10th and 90th percentiles. The user can request that trajectories from only one cluster be plotted.

plotMedTraj(s3)

\newpage The $\texttt{plotBoxplotTraj}$ function will plot the box-plot distribution of every time point in each cluster. The user can request that trajectories from only one cluster be plotted.

plotBoxplotTraj(s3)

\newpage

The $\texttt{plotCombTraj}$ function will plot the mean or median of all the clusters on one single graph. Different colours can be selected as well as different line styles.

plotCombTraj(s3)

References



Try the traj package in your browser

Any scripts or data that you put into this service are public.

traj documentation built on April 14, 2023, 12:39 a.m.