connectStates: Connect trajectory states

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Connects states using maximum interface scoring. For each state an interface score is defined by the relative distribution of states in its local l-neighborhood. A filter is applied to remove outliers (ie. false positive neighbors). States are spanned by maximizing the total interface score.

Usage

1
connectStates(sce, l = 10)

Arguments

sce

A SingleCellExperiment object

l

Neighborhood size (default: 10)

Details

CellTrails assumes that the arrangement of samples in the computed lower-dimensional latent space constitutes a trajectory. Therefore, CellTrails aims to place single samples along a maximum parsimony tree, which resembles a branching developmental continuum. Distances between samples in the latent space are computed using the Euclidean distance.

To avoid overfitting and to facilitate the accurate identification of bifurcations, CellTrails simplifies the problem. Analogous to the idea of a <e2><80><98>broken-stick regression<e2><80><99>, CellTrails groups the data and perform linear fits to separate trajectory segments, which are determined by the branching chronology of states. This leaves the optimization problem of finding the minimum number of associations between states while maximizing the total parsimony, which in theory can be solved by any minimum spanning tree algorithm. CellTrails adapts this concept by assuming that adjacent states should be located nearby and therefore share a relative high number of neighboring cells.

Each state defines a submatrix of samples that is composed of a distinct set of data vectors, i.e., each state is a distinct set of samples represented in the lower-dimensional space. For each state CellTrails identifies the l-nearest neighbors to each state's data vector and takes note of their state memberships and distances. This results in two vectors of length l times the state size (i.e., a vector with memberships and a vector with distances).

CellTrails removes spurious neighbors (outliers), whose distance to a state is greater than or equal to

e^{median(log(D)) + MAD(log(D))}

where D is a matrix containing all collected l-nearest neighbor sample distances to any state in the latent space.

For each state CellTrails calculates the relative frequency on how often a state occurs in the neighborhood of a given state, which is refered to as the interface cardinality scores.

CellTrails implements a greedy algorithm to find the tree maximizing the total interface cardinality score, similar to a minimum spanning tree algorithm (Kruskal, 1956). In a nutshell, all interface cardinality scores are organized in a sorted linked list, and a graph with no edges, but k nodes (one for each state) is initialized. During each iteration the highest score is selected, removed from the list and its corresponding edge (connecting two states), if it is not introducing a cycle or is already existent, is added to the graph. The algorithm terminates if the size of the graph is k-1 (with k equals number of states) or the list is empty. A cycle is determined if nodes were revisited while traversing the graph using depth-first search. Its construction has a relaxed requirement (number of edges < number of nodes) compared to a tree (number of edges = number of nodes - 1), which may result in a graph (forest) having multiple tree components, i.e. several trajectories or isolated nodes.

Diagnostic messages

An error is thrown if the states have not been defined yet; function findStates needs to be called first.

Value

An updated SingleCellExperiment object

Author(s)

Daniel C. Ellwanger

References

Kruskal, J.B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Amer Math Soc 7, 48-50.

See Also

findStates states

Examples

1
2
3
4
5
# Example data
data(exSCE)

# Connect states
exSCE <- connectStates(exSCE, l=30)

CellTrails documentation built on Nov. 8, 2020, 5:53 p.m.