trajectories: Cluster longitudinal trajectories over time.

View source: R/trajectories.R

trajectoriesR Documentation

Cluster longitudinal trajectories over time.

Description

Performs k-means clustering on continuous response measured over time, where each mean is defined by a thin plate spline fit to all points in a cluster. Typically, this function is called by clustra.

Usage

trajectories(
  data,
  k,
  group,
  maxdf,
  conv = c(10, 0),
  mccores = 1,
  verbose = FALSE,
  ...
)

Arguments

data

Data table or data frame with response measurements, one per observation. Column names are id, time, response, group. Note that ids must be sequential starting from 1. This affects expanding group numbers to ids.

k

Number of clusters (groups)

group

Vector of initial group numbers corresponding to ids.

maxdf

Integer. Basis dimension of smooth term. See s function parameter k, in package mgcv.

conv

A vector of length two, c(iter, minchange), where iter is the maximum number of EM iterations and minchange is the minimum percentage of subjects changing group to continue iterations. Setting minchange to zero continues iterations until no more changes occur or maxiter is reached.

mccores

Integer number of cores to use by mclapply sections. Parallelization is over k, the number of clusters.

verbose

Logical, whether to produce debug output. A value > 1 will plot tps fit lines in each iteration.

...

See clustra for allowed ... parameters.

Value

A list with components

  • deviance - The final deviance in each cluster added across clusters.

  • group - Integer vector of group assignments corresponding to unique ids.

  • loss - Numeric matrix with rows corresponding to unique ids and one column for each cluster. Each entry is the mean squared loss for the data in the id relative to the cluster model.

  • k - An integer giving the requested number of clusters.

  • k_cl - An integer giving the converged number of clusters. Can be smaller than k when some clusters become too small for degrees of freedom during convergence.

  • data_group - An integer vector, giving group assignment as expanded into all id time points.

  • tps - A list with k_cl elements, each an object returned by the mgcv::bam fit of a cluster thin plate spline model.

  • iterations - An integer giving the number of iterations taken.

  • counts - An integer vector giving the number of ids in each cluster.

  • counts_df - An integer vector giving the total number of observations in each cluster (sum of the number of observations for ids belonging to the cluster).

  • changes - An integer, giving the number of ids that changed clusters in the last iteration. This is zero if converged.

Author(s)

George Ostrouchov and David Gagnon


clustra documentation built on Oct. 14, 2023, 9:15 a.m.