require(knitr)
opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocStyle)

Overview

The r Biocpkg("TrajectoryUtils") package contains low-level utilities to support trajectory inference packages. Support is the key word here: this package does not perform any inference itself, deferring that to higher-level packages like r Biocpkg("TSCAN") and r Biocpkg("slingshot"). In fact, most of the code in this package was factored out of those two packages after we realized their commonalities. By rolling this out into a separate package, we can cut down on redundancy, facilitate better propagation of features and simplify maintenance. If you're a developer and you have a general utility for trajectory inference, contributions are welcome.

MST construction

The construction of a cluster-based minium spanning tree is the primary motivator for this package, as it was implemented separately in r Biocpkg("TSCAN") and r Biocpkg("slingshot"). The idea is simple - cluster cells and create a minimum spanning tree from the cluster centroids to serve as the backbone for the trajectory reconstruction.

# Mocking up a Y-shaped trajectory.
centers <- rbind(c(0,0), c(0, -1), c(1, 1), c(-1, 1))
rownames(centers) <- seq_len(nrow(centers))
clusters <- sample(nrow(centers), 1000, replace=TRUE)
cells <- centers[clusters,]
cells <- cells + rnorm(length(cells), sd=0.5)

# Creating the MST:
library(TrajectoryUtils)
mst <- createClusterMST(cells, clusters)
plot(mst)

The implementation in createClusterMST() combines several useful ideas from the two aforementioned packages:

Operations on the MST

Once the MST is constructed, we can use the guessMSTRoots() function to guess the possible roots. One root is guessed for each component of the MST, though the choice of root depends on the method in use. For example, we can choose a root node that maximizes the steps/distance to all terminal (degree-1) nodes; this results in fewer branch events when defining paths through the MST.

guessMSTRoots(mst, method="maxstep")
guessMSTRoots(mst, method="minstep")

Once a root is guessed, we can define paths from the root to all terminal nodes using the defineMSTPaths() function. The interpretation of each path depends on the appropriateness of the chosen root; for example, in developmental contexts, each path can be treated as a lineage if the root corresponds to some undifferentiated state.

defineMSTPaths(mst, roots=c("1"))
defineMSTPaths(mst, roots=c("2"))

Alternatively, if timing information is available (e.g., from an experimental time-course, or from computational methods like RNA velocity), we can use that to define paths through the MST based on the assumed movement of cells from local minima to the nearest local maxima in time. In the mock example below, the branch point at cluster 1 has the highest time so all paths converge from the terminal nodes to cluster 1.

defineMSTPaths(mst, times=c(`1`=4, `2`=0, `3`=1, `4`=0))

We can also use a root-free method of defining paths through the MST, by simply defining all unbranched segments as separate paths. This decomposes the MST into simpler structures for, e.g., detection of marker genes, without explicitly defining a root or requiring external information. In fact, one might prefer this approach as it allows us to interpret each section of the MST in a more modular manner, without duplication of effort caused by the sharing of nodes across different paths from a common root.

splitByBranches(mst)

The PseudotimeOrdering class

We can create a compact representation of pseudotime orderings in the form of a matrix where rows are cells and columns are paths through the trajectory. Each cell receives a pseudotime value along a path - or NA, if the cell does not lie on that path. On occasion, we may wish to annotate this with metadata on the cells or on the paths. This is supported by the PseudotimeOrdering class:

# Make up a matrix of pseudotime orderings.
ncells <- 200
npaths <- 5
orderings <- matrix(rnorm(1000), ncells, npaths)

# Default constructor:
(pto <- PseudotimeOrdering(orderings))

It is then straightforward to add metadata on the cells:

pto$cluster <- sample(LETTERS[1:5], ncells, replace=TRUE)
cellData(pto)

Or on the paths:

pathData(pto)$description <- sprintf("PATH-%i", seq_len(npaths))
pathData(pto)

Additional matrices can also be added if multiple statistics are available for each cell/path combination. For example, one might imagine assigning the confidence to which each cell is assigned to each path.

pathStat(pto, "confidence") <- matrix(runif(1000), ncells, npaths)
pathStatNames(pto)

For convenience, we also provide the averagePseudotime() function to compute the average pseudotime for each cell. This is sometimes necessary in applications that only expect a single set of pseudotime values, e.g., for visualization.

summary(averagePseudotime(pto))

Session information {-}

sessionInfo()


LTLA/TrajectoryUtils documentation built on Feb. 5, 2024, 11:56 a.m.