Clustering Tracks with CelltrackR

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 4.5,
  fig.height = 3
)
# Save current par() settings
oldpar <- par( no.readonly =TRUE )

Introduction

To group tracks with similar properties, one can in principle perform any clustering method of interest on a feature matrix of quantification metrics for each track in the dataset. The package comes with three convenience functions -- getFeatureMatrix(),trackFeatureMap(), and clusterTracks() -- to easily compute several metrics on all tracks at once, visualize them in 2 dimensions, and to cluster tracks accordingly. This tutorial shows how to use these functions to explore heterogeneity in a track dataset.

Datasets

First load the package:

library( celltrackR )

The package contains a dataset of B and T cells in a mouse cervical lymph node, and neutrophils responding to an S. aureus infection in a mouse ear; all are imaged using two-photon microscopy. While the original data contained 3D coordinates, we'll use the 2D projection on the XY plane (see the vignettes on quality control methods and preprocessing of the package datasets for details).

The T-cell dataset consists of 199 tracks of individual cells in a tracks object:

str( TCells, list.len = 2 )

Each element in this list is a track from a single cell, consisting of a matrix with $(x,y)$ coordinates and the corresponding measurement timepoints:

head( TCells[[1]] )

Similarly, we will also use the BCells and Neutrophils data:

str( BCells, list.len = 2 )
str( Neutrophils, list.len = 2 )

Since there are quite many cells, we'll sample just some of the tracks for the legibility of the plots in this tutorial -- but everything we will do could also be done on the complete datasets.

# Take a sample
set.seed(1234)
TCells <- TCells[ sample( names(TCells), 30 ) ]
BCells <- BCells[ sample( names(BCells), 30 ) ]
Neutrophils <- Neutrophils[ sample( names(Neutrophils), 30 ) ]

Combine them in a single dataset, where labels also indicate celltype:

T2 <- TCells
names(T2) <- paste0( "T", names(T2) )
tlab <- rep( "T", length(T2) )

B2 <- BCells
names(B2) <- paste0( "B", names(B2) )
blab <- rep( "B", length(B2) )

N2 <- Neutrophils
names(N2) <- paste0( "N", names(Neutrophils) )
nlab <- rep( "N", length( N2) )

all.tracks <- c( T2, B2, N2 )
real.celltype <- c( tlab, blab, nlab )

1 Extracting a feature matrix

Using the function getFeatureMatrix(), we can quickly apply several quantification measures to all data at once (see ?TrackMeasures for an overview of measures we can compute):

m <- getFeatureMatrix( all.tracks, 
                       c(speed, meanTurningAngle, 
                         outreachRatio, squareDisplacement) )

# We get a matrix with a row per track and one column for each metric:
head(m)

We can use this matrix to explore relationships between different metrics. For example, we can check the relationship between speed and mean turning angle:

plot( m, xlab = "speed", ylab = "mean turning angle" )

2 Dimensionality reduction methods: PCA, MDS, and UMAP

When using more than two metrics at once to quantify track properties, it becomes hard to visualize which tracks are similar to each other. Like with single-cell data, dimensionality reduction methods can help visualize high-dimensional track feature datasets. The function trackFeatureMap() is a wrapper method that helps to quickly visualize data using three popular methods: principal component analysis (PCA), multidimensional scaling (MDS), and uniform manifold approximate and projection (UMAP). The function trackFeatureMap() can be used for a quick visualization of data, or return the coordinates in the new axis system if the argument return.mapping=TRUE.

1.1 PCA

Use trackFeatureMap() to perform a principal component analysis (PCA) based on the measures "speed", "meanTurningAngle", "squareDisplacement", "maxDisplacement", and "outreachRatio" using the optional labels argument to color points by their real celltype and return.mapping=TRUE to also return the data rather than just the plot:

pca <- trackFeatureMap( all.tracks, 
               c(speed,meanTurningAngle,squareDisplacement,
                 maxDisplacement,outreachRatio ), method = "PCA", 
               labels = real.celltype, return.mapping = TRUE )

Note that the B cells and Neutrophils are relatively well-separated from each other in this plot, but the T cells are hard to distinguish from neutrophils based on these features.

We can then inspect the data stored in pca. This reveals, for example, that the first principal component is correlated with speed:

pc1 <- pca[,1]
pc2 <- pca[,2]
track.speed <- sapply( all.tracks, speed )
cor.test( pc1, track.speed )

See ?prcomp for details on how principal components are computed, and for further arguments that can be passed on to trackFeatureMap().

1.2 MDS

Another popular method for visualization is multidimensional scaling (MDS), which is also supported by trackFeatureMap():

trackFeatureMap( all.tracks,
               c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
                 outreachRatio ), method = "MDS",
               labels = real.celltype )

Internally, trackFeatureMap() computes a distance matrix using dist() and then applies MDS using cmdscale(). See the documentation of cmdscale for details and further arguments that can be passed on via trackFeatureMap().

Again, we find that the B cells and Neutrophils are separated in this plot, while T cells mix with neutrophils.

1.3 UMAP

Uniform manifold approximate and projection (UMAP) is another powerful method to explore structure in high-dimensional datasets. trackFeatureMap() supports visualization of tracks in a UMAP. Note that this option requires the uwot package. Please install this package first using install.packages("uwot").

trackFeatureMap( all.tracks,
        c(speed,meanTurningAngle,squareDisplacement,
          maxDisplacement,outreachRatio ), method = "UMAP",
          labels = real.celltype )

Also this plot confirms that B cells and neutrophils can be separated easily, but that T cells are somewhere in between.

3 Clustering: hierarchical clustering and k-means

To go beyond visualizing similar and dissimilar tracks using multiple track features, clusterTracks() supports the explicit grouping of tracks into clusters using two common methods: hierarchical and k-means clustering.

3.1 Hierarchical clustering

Hierarchical clustering is performed by calling hclust() on a distance matrix computed via dist() on the feature matrix:

clusterTracks( all.tracks,
               c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
                 outreachRatio ), method = "hclust", labels = real.celltype )

See methods dist() and hclust() for details.

3.2 K-means clustering

Secondly, clusterTracks() also supports k-means clustering of tracks. Note that this requires an extra argument centers that is passed on to the kmeans() function and specifies the number of clusters to make. In this case, let's use three clusters because we have three celltypes:

clusterTracks( all.tracks,
               c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
                 outreachRatio ), method = "kmeans", 
               labels = real.celltype, centers = 3 )

In these plots, we see the value of each feature in the feature matrix plotted for the different clusters, whereas the color indicates the "real" celltype the track came from.

# Reset par() settings
par(oldpar)


Try the celltrackR package in your browser

Any scripts or data that you put into this service are public.

celltrackR documentation built on March 21, 2022, 5:06 p.m.