Tracking data can be difficult to analyze because of different errors and artifacts that can occur in the data and cause bias in the analysis. This vignette will explain how the package can be used to detect and correct for some of these errors. The topics will be:

- Checking track lengths and dealing with short tracks
- Detecting and correcting tissue drift
- Detecting and correcting tracking errors and imaging artifacts using angle analyses
- Detecting and correcting variation in timesteps

First load the package:

library( celltrackR ) library( ggplot2 )

# Save current par() settings oldpar <- par( no.readonly =TRUE )

The package contains a dataset of T cells imaged in a mouse peripheral lymph node using two photon microscopy. We will here use this dataset as an example of how to perform quality control.

The dataset consists of 21 tracks of individual cells in a tracks object:

str( TCells, list.len = 3 )

Each element in this list is a track from a single cell, consisting of a matrix with $(x,y,z)$ coordinates and the corresponding measurement timepoints:

```
head( TCells[[1]] )
```

Especially for in vivo imaging, cells are often imaged in a limited imaging window and for a limited time. This can result in very short tracks that make the data difficult to analyze. We can check the distribution of track lengths in a dataset as follows:

# Each track has a coordinate matrix with one row per coordinate; # The number of steps is the number of rows minus one. track.lengths <- sapply( TCells, nrow ) - 1 hist( track.lengths, xlab = "Track length (#steps)" ) summary( track.lengths )

Alternatively, we can directly check the maximum track length (in number of coordinates, so the number of steps will be this value minus one):

# This is the number of coordinates, so the number of steps is one less. maxTrackLength( TCells )

To prevent problems from short tracks, we can filter them out using the `filterTracks()`

function. The first argument of this function is a function we that returns `TRUE`

or `FALSE`

for each track:

# nrow() of a track is always the number of steps plus one. # For steps >= min.steps, we can substitute nrow > min.steps: filter.criterion <- function( x, min.steps ){ nrow(x) > min.steps } TCells.filtered <- filterTracks( filter.criterion, TCells, min.steps = 11 ) # Or shorthand: filterTracks( function(x) nrow(x) > min.steps, TCells )

Let's check the effect of the filtering step in the histogram:

# Find lengths of the filtered dataset track.lengths.filtered <- sapply( TCells.filtered, nrow ) - 1 # Histograms of track lengths before and after par( mfrow=c(1,2) ) hist( track.lengths, xlab = "Track length (#steps)", main = "Before filtering", breaks = seq(0, 40, by = 10 ) ) hist( track.lengths.filtered, xlab = "Track length (#steps)", main = "After filtering", breaks = seq(0, 40, by = 10 )) # Check how many tracks are left: length( TCells.filtered )

The filtering has removed all tracks of 1-10 steps, leaving 17 out of 21 tracks in the dataset.

!

Note: Filtering out short tracks can introduce bias in a dataset. See Beltman et al (2009) for details and for step-based analyses as alternative methods to deal with short tracks. See the vignette on analysis methods for an explanation of how to use these methods.

!

Note: The methods suggested below all detect a form of global directionality in a dataset. However, any global directionality may either be an artifact from drift,ora true phenomenon in the data (for example because cells are following a chemotactic gradient). To distinguish between the two, it is recommended to detect and correct for drift based on tracks from structures that do not move at all, instead of the tracks from the cells of interest. If such data is absent, the original tracks can be used -- but this may also remove any true directional bias present in the data.

To detect global drift of a tissue of interest, let's add some drift to the `TCell`

data:

# Drift is 0.05 micron/sec in each dimension drift.speed <- c( 0.05, 0.05, 0.05 ) add.drift <- function( x, drift.vector ) { # separate timepoints and coordinates tvec <- x[,1] coords <- x[,-1] # compute movement due to drift. drift.matrix <- matrix( rep( drift.vector, nrow(coords) ), ncol = ncol(coords), byrow= TRUE ) drift.matrix <- drift.matrix * tvec # Add drift to coordinates x[,-1] <- coords + drift.matrix return(x) } # Create data with drift TCells.drift <- as.tracks( lapply( TCells, add.drift, drift.speed ) ) # Plot both for comparison par(mfrow=c(1,2) ) plot(TCells, main = "original data" ) plot(TCells.drift, main = "with tissue drift" ) # Overlay starting points to view directionality plot( normalizeTracks(TCells), main = "original data" ) plot( normalizeTracks(TCells.drift), main = "with tissue drift" )

Textor et al (2011) proposed using Hotelling's T-square test to detect directionality in a dataset. This test computes displacement vectors of all steps in the data, and tests if the mean displacement vector is significantly different from the null vector.

In using this test, it is important to realise that steps from within a track are not independent: cells usually exhibit some kind of persistent motion at least on a short time scale (that is: the same cell does not tend to go in completely opposite directions at to subsequent timepoints). This means that blindly extracting all steps violates the "independent observations" assumption of Hotelling's test.

For example, applying this to the `TCell`

dataset:

hotellingsTest( TCells, plot = TRUE, col = "gray" )

there seems to be a significant directionality in the $y$ direction (as shown by the blue ellipse).

However, when we only consider steps that are some distance apart using the `step.spacing`

argument:

hotellingsTest( TCells, plot = TRUE, col = "gray", step.spacing = 5 )

there is no longer evidence for directionality. (Note, however, that the power of the test is also reduced because the larger step spacing reduces the total number of steps).

By contrast, this directionality remains obvious in the dataset with drift:

hotellingsTest( TCells.drift, plot = TRUE, col = "gray", step.spacing = 5 )

!

Note: The appropriate value of`step.spacing`

depends on the level of persistence expected in the cells of interest, as well as on the time resolution of the experiment. An autocorrelation or autocovariance plot may provide insights in the persistence time for the cells of interest (see the vignette on analysis methods for details). If tracks of static structures are used to detect and correct for drift -- as is recommended to distinguish between drift and truly directed movement -- a`step.spacing`

of zero is safe to use because there should be no inherent persistence for something that does not move.

Of note, by default `hotellingsTest()`

is performed on a *projection* of the tracks on only the $x$ and $y$ dimensions. For 3D tracking data like the `TCells`

, we can specify that all three dimensions should be taken into account (although plotting is not supported in 3D):

hotellingsTest( TCells, dim = c("x","y","z"), step.spacing = 5 ) hotellingsTest( TCells.drift, dim = c("x","y","z"), step.spacing = 5 )

# sp <- seq(0,10) # # htest.nodrift <- lapply( sp, function(x) hotellingsTest( TCells, # dim = c("x","y","z"), # step.spacing = x )) # htest.drift <- lapply( sp, function(x) hotellingsTest( TCells.drift, # dim = c("x","y","z"), # step.spacing = x )) # # pval.nodrift <- sapply( htest.nodrift, function(x) x$p.value ) # pval.drift <- sapply( htest.drift, function(x) x$p.value ) # stat.nodrift <- sapply( htest.nodrift, function(x) unname(x$statistic) ) # stat.drift <- sapply( htest.drift, function(x) unname(x$statistic) ) # # d.drift <- data.frame( step.spacing = sp, # exp = "drift", # pval = pval.drift, # Tstatistic = stat.drift ) # d.nodrift <- data.frame( step.spacing = sp, # exp = "no drift", # pval = pval.nodrift, # Tstatistic = stat.nodrift ) # d <- rbind( d.drift, d.nodrift ) # # p1 <- ggplot( d, aes( x = step.spacing, y = pval, color = exp ) ) + # geom_line() + # geom_hline( yintercept = 0.05, color = "red" ) + # scale_color_manual( values = c( "drift" = "black", "no drift" = "gray") ) + # scale_y_log10() + # theme_classic() # # p2 <- ggplot( d, aes( x = step.spacing, y = Tstatistic, color = exp ) ) + # geom_line() + # scale_y_continuous( limits = c(0,NA) ) + # scale_color_manual( values = c( "drift" = "black", "no drift" = "gray") ) + # theme_classic() # # gridExtra::grid.arrange(p1,p2,ncol=2)

Beltman et al (2009) proposed an analysis of angles versus distance between cell pairs to detect global directionality in a dataset. Use the function `analyzeCellPairs`

to get a dataframe with for each pair of tracks in the dataset the angle (between their displacement vectors) and the distance (min distance between the tracks at any timepoint). Note that the distance is not defined for pairs of tracks that have no overlap in timepoints -- these will get an NA value.

# compute for both original as drift data df.drift <- analyzeCellPairs( TCells.drift ) df.norm <- analyzeCellPairs( TCells ) # Plot p.norm <- ggplot( df.norm, aes( x = dist, y = angle ) ) + geom_point( color = "gray40" ) + stat_smooth( span = 1, color = "black" )+ labs( x = "distance between cell pairs", y = "angle between cell pairs", title = "original data") + geom_hline( yintercept = 90, color = "red" ) + theme_classic() p.drift <- ggplot( df.drift, aes( x = dist, y = angle ) ) + geom_point( color = "gray40" ) + stat_smooth( span = 1, color = "black" )+ labs( x = "distance between cell pairs", y = "angle between cell pairs", title = "data with drift") + geom_hline( yintercept = 90, color = "red" ) + theme_classic() gridExtra::grid.arrange( p.norm, p.drift, ncol = 2 )

In the original data, the mean angle is roughly 90 degrees on average for cell pairs at any distance from each other, which is what we would expect if there is no global directionality in the data. In the data with drift, the average angle is lower than we would expect even at large distances between cell pairs -- which indicates that there is global directionality (even cells far apart move roughly in the same direction).

We can perform a similar analysis on the level of single steps to boost the power.
The function `analyzeStepPairs()`

finds all pairs of steps from *different* cells that occur at the *same* timepoint, and then computes distances (between the step starting points) and angles between them. Note that we use a function in the argument `filter.steps`

to take into account only steps with a minimum displacement, to avoid noise from steps where the cell is pausing.

# compute for both original as drift data df.drift <- analyzeStepPairs( TCells.drift, filter.steps = function(x) displacement(x)>2 ) df.norm <- analyzeStepPairs( TCells, filter.steps = function(x) displacement(x)>2 ) # Plot p.norm <- ggplot( df.norm, aes( x = dist, y = angle ) ) + geom_point( color = "gray40", size = 0.5 ) + stat_smooth( span = 0.75, color = "black" )+ labs( x = "distance between step pairs", y = "angle between step pairs", title = "original data") + geom_hline( yintercept = 90, color = "red" ) + theme_classic() p.drift <- ggplot( df.drift, aes( x = dist, y = angle ) ) + geom_point( color = "gray40", size = 0.5 ) + stat_smooth( span = 0.75, color = "black" )+ labs( x = "distance between step pairs", y = "angle between step pairs", title = "data with drift") + geom_hline( yintercept = 90, color = "red" ) + theme_classic() gridExtra::grid.arrange( p.norm, p.drift, ncol = 2 )

Again we see a mean step pair angle that is below 90 degrees for the data with drift, which is not the case for the original data.

If Hotelling's test and/or angle analyses provide evidence for drift in the dataset, this can be corrected by computing the mean step displacement:

# Get steps and find their displacement vectors steps.drift <- subtracks( TCells.drift, 1 ) step.disp <- t( sapply( steps.drift, displacementVector ) ) # Get the mean mean.displacement <- colMeans( step.disp ) # Divide this by the mean timestep to get a drift speed drift.speed <- mean.displacement/timeStep( TCells.drift ) drift.speed

This is indeed roughly the drift speed we introduced in the data. Now remove this:

correct.drift <- function( x, drift.vector ) { # separate timepoints and coordinates tvec <- x[,1] coords <- x[,-1] # compute movement due to drift. drift.matrix <- matrix( rep( drift.vector, nrow(coords) ), ncol = ncol(coords), byrow= TRUE ) drift.matrix <- drift.matrix * tvec # Add drift to coordinates x[,-1] <- coords - drift.matrix return(x) } # Create data with drift TCells.corrected <- as.tracks( lapply( TCells.drift, correct.drift, drift.speed ) ) # Compare par( mfrow = c(1,2) ) plot( TCells, col = "gray", main = "uncorrected" ) plot( TCells.drift, col = "red", add = TRUE ) plot( TCells, col = "gray", main = "corrected" ) plot( TCells.corrected, col = "red", add = TRUE ) # par( mfrow = c(1,2) ) plot( normalizeTracks(TCells), col = "gray", main = "uncorrected" ) plot( normalizeTracks(TCells.drift), col = "red", add = TRUE ) plot( normalizeTracks(TCells), col = "gray", main = "corrected" ) plot( normalizeTracks(TCells.corrected), col = "red", add = TRUE )

While the correction is not perfect, at least the tracks resemble their original shape more.

To simulate a dataset with double tracking, take the first TCell track and keep it roughly the same, but with some small noise added to the coordinates:

# Take the track with id "2" dup.track <- TCells[["2"]] # Add some noise to coordinates dup.track[,"x"] <- dup.track[,"x"] + rnorm( nrow(dup.track), sd = 0.5 ) dup.track[,"y"] <- dup.track[,"y"] + rnorm( nrow(dup.track), sd = 0.5 ) dup.track[,"z"] <- dup.track[,"z"] + rnorm( nrow(dup.track), sd = 0.5 ) # Wrap the track in a tracks object and add it to the TCell data with # a unique id number dup.track <- wrapTrack( dup.track ) names(dup.track) <- "22" TCells.dup <- c( TCells, dup.track )

This can again be detected by plotting the angle between cell pairs versus the distance between cell pairs (see also section 2.3 above and Beltman et al (2009)):

df <- analyzeCellPairs( TCells.dup ) # label cellpairs that have both angle and distance below threshold angle.thresh <- 90 # in degrees dist.thresh <- 10 # this should be the expected cell radius df$id <- paste0( df$cell1,"-",df$cell2 ) df$id[ !(df$angle < angle.thresh & df$dist < dist.thresh) ] <- "" # Plot ggplot( df, aes( x = dist, y = angle ) ) + geom_point( color = "gray40" ) + geom_text( aes( label = id ), color = "red" ) + labs( x = "distance between cell pairs", y = "angle between cell pairs" ) + geom_hline( yintercept = angle.thresh, col = "blue",lty=2 ) + geom_vline( xintercept = dist.thresh, col = "blue", lty=2) + theme_classic()

We indeed find the pair of tracks with ids "2" (the original track) and "22" (the noisy duplicate of the original track). Plot these tracks specifically to check:

plot( TCells.dup[c("2","22")])

That indeed looks like double tracking.

A plot of angles and distances to the border planes of the imaging volume can help detect artifacts (Beltman et al (2009) ). We can compute these for each step using the functions `distanceToPlane`

and `angleToPlane`

.

To specify a border plane, we first need three points on that plane. We can either define the imaging volume manually or estimate it using `boundingBox`

:

tracks <- TCells bb <- boundingBox( tracks ) bb

Let's take the two borders in the z-dimension as an example. The lower z-plane contains points (minx,miny,minz), (maxx,miny,minz), (maxx,maxy,minz). The upper z-plane is at distance (maxz-minz) from the lower plane.

# Define points: lower1 <- c( bb["min","x"], bb["min","y"], bb["min","z"] ) lower2 <- c( bb["max","x"], bb["min","y"], bb["min","z"] ) lower3 <- c( bb["max","x"], bb["max","y"], bb["min","z"] ) zsize <- bb["max","z"] - bb["min","z"] # Compute angles and distances of steps to this plane. single.steps <- subtracks( tracks, 1 ) angles <- sapply( single.steps, angleToPlane, p1 = lower1, p2 = lower2, p3 = lower3 ) distances <- sapply( single.steps, distanceToPlane, p1 = lower1, p2 = lower2, p3 = lower3 ) df <- data.frame( angles = angles, distances = distances ) # Plot ggplot( df, aes( x = distances, y = angles ) ) + geom_point( color = "gray40" ) + stat_smooth( method = "loess", span = 1, color = "black" ) + geom_hline( yintercept = 32.7, color = "red" ) + scale_x_continuous( limits=c(0,zsize ), expand = c(0,0) ) + theme_classic()

The mean angle to the border plane should be roughly 32.7 degrees, and indeed this seems to be the case. Tracking errors near the border plane would result in a lower average angle near the left and right side of the plots, whereas imprecise z-calibration would result in a systematic deviation from the 32.7 degree angle at any distance to the plane (Beltman et al (2009) ).

Track analysis methods usually assume a constant time interval $\Delta t$ between consecutive images in a time-lapse microscopy dataset. In reality, there are mostly at least small fluctuations in the $\Delta t$ between consecutive images. To find the $\Delta t$ of each step, we first extract single steps using the `subtracks()`

function (see also the vignette on analysis methods):

# Extract all subtracks of length 1 (that is, all "steps") single.steps <- subtracks( TCells, 1 ) # The output is a new tracks object with a unique track for each step # in the data (no longer grouped by the original cell they came from): str( single.steps, list.len = 3 )

After extracting the single steps, we can find the median $\Delta t$ using the `timeStep()`

function:

median.dt <- timeStep( TCells ) median.dt

We then find the actual $\Delta t$ of each step by applying the `duration()`

function to each step in the `single.steps`

object:

step.dt <- sapply( single.steps, duration ) str(step.dt)

And visualize the difference with the median $\Delta t$ in a histogram (expressed as a percentage of $\Delta t$:

dt.diff.perc <- (step.dt - median.dt) * 100 / median.dt hist( dt.diff.perc, xlab = "dt (percentage difference from median)" )

Thus, the `TCell`

dataset indeed contains fluctuations in $\Delta t$, but they are small: none are more than 1 percent of the median $\Delta t$ in the data.

As an example of a dataset containing gaps in tracks, we will introduce an artificial error by removing some timepoints in a few tracks in the `TCell`

data:

# This function randomly removes coordinates from a track dataset with probability "prob" remove.points <- function( track, prob=0.1 ){ tlength <- nrow( track ) remove.rows <- sample( c(TRUE,FALSE), tlength, replace=TRUE, prob = c(prob, (1-prob) ) ) track <- track[!remove.rows,] return(track) } # Apply function to dataset to randomly remove coordinates in the data TCells.gap <- as.tracks( lapply( TCells, remove.points ) )

Create the same histogram as before:

# median dt of the new data median.dt.gap <- timeStep( TCells.gap ) # duration of the individual steps steps.gap <- subtracks( TCells.gap, 1 ) step.dt.gap <- sapply( steps.gap, duration ) # express difference as percentage of median dt dt.diff.perc.gap <- (step.dt.gap - median.dt.gap) * 100 / median.dt.gap hist( dt.diff.perc.gap, xlab = "dt (percentage difference from median)" )

Now, the percentage difference with the median $\Delta t$ is often much larger.

This has an effect on for example the step-based displacement:

T1.step.disp <- sapply( single.steps, displacement ) T1.gap.disp <- sapply( steps.gap, displacement ) lapply( list( original = T1.step.disp, gaps = T1.gap.disp ), summary )

Note that the mean and median displacement are slightly higher due to the gaps, because steps with a gap in between are actually two steps and thus have a larger displacement.

For a simple step-based analysis of displacement, we can use `normalizeToDuration()`

to correct for differences in time resolution:

T1.norm.disp <- sapply( single.steps, normalizeToDuration( displacement ) ) T1.norm.gap.disp <- sapply( steps.gap, normalizeToDuration( displacement ) ) lapply( list( original = T1.norm.disp, gaps = T1.norm.gap.disp ), summary )

The difference in displacements is now gone. For more complicated analyses, one might want to correct the gaps in the dataset via splitting tracks or via interpolation -- see the next section.

If a dataset contains gaps, as detected in the histogram of $\Delta t$, we can correct these using the `repairGaps`

function, where we can select one of three correction methods using the `how`

argument:

- Dropping all tracks with gaps (
`how = "drop"`

) - Splitting tracks around the gaps(
`how = "split"`

) - Interpolating the track (
`how = "interpolate"`

)

For example:

# Repair gaps by splitting or interpolation, the number of tracks is # different after each fix split.gap <- repairGaps( TCells.gap, how = "split" ) interpolate.gap <- repairGaps( TCells.gap, how = "interpolate" ) c( "after splitting" = length( split.gap), "after interpolation" = length( interpolate.gap ) )

Finally, if we wish to compare tracks from experiments imaged at a different time resolution $\Delta t$, we can use the function `interpolateTrack()`

to estimate the position of each cell at any set of timepoints of interest.

For example, let's create a second T cell dataset where we keep only every second timepoint (to effectively increase $\Delta t$ twofold while keeping the same dataset):

T2 <- subsample( TCells, k = 2 )

When we now perform a step-based analysis of displacement, the two will be different:

# displacement T1.steps <- subtracks( TCells, 1 ) T1.disp <- sapply( T1.steps, displacement ) T2.steps <- subtracks( T2, 1 ) T2.disp <- sapply( T2.steps, displacement ) lapply( list( T1 = T1.disp, T2 = T2.disp ), summary )

To correct for the difference in time resolution, we can interpolate both datasets at a fixed time resolution:

# interpolate both datasets at the time resolution of the neutrophils dt <- timeStep( TCells ) interpolate.dt <- function( x, dt, how = "spline" ){ trange <- range( timePoints( wrapTrack( x ) ) ) tvec <- seq( trange[1], trange[2], by = dt ) x <- interpolateTrack( x, tvec, how = how ) return(x) } T1.corrected <- as.tracks( lapply( TCells, interpolate.dt, dt = dt ) ) T2.corrected <- as.tracks( lapply( T2, interpolate.dt, dt = dt ) ) # Check the effect on the displacement statistics: T1.corr.steps <- subtracks( T1.corrected, 1 ) T1.corr.disp <- sapply( T1.corr.steps, displacement ) T2.corr.steps <- subtracks( T2.corrected, 1 ) T2.corr.disp <- sapply( T2.corr.steps, displacement ) lapply( list( T1 = T1.disp, T2 = T2.disp, T1.corr = T1.corr.disp, T2.corr = T2.corr.disp ), summary )

The difference is now much smaller.

# Reset par() settings par(oldpar)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.