Description Usage Arguments Details Value Author(s) References See Also Examples
Connects states using maximum interface scoring. For each state an interface score is defined by the relative distribution of states in its local l-neighborhood. A filter is applied to remove outliers (ie. false positive neighbors). States are spanned by maximizing the total interface score.
1 | connectStates(sce, l = 10)
|
sce |
A |
l |
Neighborhood size (default: 10) |
CellTrails assumes that the arrangement of samples
in the computed lower-dimensional latent space constitutes a trajectory.
Therefore, CellTrails aims to place single samples along a maximum parsimony
tree, which resembles a branching developmental continuum. Distances between
samples in the latent space are computed using the Euclidean distance.
To avoid overfitting and to facilitate the accurate identification of
bifurcations, CellTrails simplifies the problem. Analogous to the idea of
a ‘broken-stick regression’, CellTrails groups the data and perform linear
fits to separate trajectory segments, which are determined by the branching
chronology of states. This leaves the optimization problem of finding the
minimum number of associations between states while maximizing the total
parsimony, which in theory can be solved by any minimum spanning tree
algorithm. CellTrails adapts this concept by assuming that adjacent states
should be located nearby and therefore share a relative high number of
neighboring cells.
Each state defines a submatrix of samples that is composed of a distinct
set of data vectors, i.e., each state is a distinct set of samples
represented in the lower-dimensional space. For each state CellTrails
identifies the l-nearest neighbors to each state's data
vector and takes note of their state memberships and distances.
This results in two vectors of length l times the state size
(i.e., a vector with memberships and a vector with distances).
CellTrails removes spurious neighbors (outliers),
whose distance to a state is greater than or equal to
e^{median(log(D)) + MAD(log(D))}
where D is a matrix containing all collected l-nearest neighbor sample
distances to any state in the latent space.
For each state CellTrails calculates the relative frequency on
how often a state occurs in the neighborhood
of a given state, which is refered to as the interface cardinality scores.
CellTrails implements a greedy algorithm to find the tree maximizing
the total interface cardinality score,
similar to a minimum spanning tree algorithm (Kruskal, 1956).
In a nutshell, all interface cardinality
scores are organized in a sorted linked list, and a graph
with no edges, but k nodes (one for each state)
is initialized. During each iteration the highest score is
selected, removed from the list and its corresponding
edge (connecting two states), if it is not introducing a cycle or is
already existent, is added to the graph.
The algorithm terminates if the size of the graph is
k-1 (with k equals number of states) or the
list is empty. A cycle is determined if nodes were revisited
while traversing the graph using depth-first search.
Its construction has a relaxed requirement (number of edges <
number of nodes) compared to a tree
(number of edges = number of nodes - 1), which may result in a
graph (forest) having multiple tree components,
i.e. several trajectories or isolated nodes.
Diagnostic messages
An error is thrown if the states have not been defined yet;
function findStates
needs to be called first.
An updated SingleCellExperiment
object
Daniel C. Ellwanger
Kruskal, J.B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Amer Math Soc 7, 48-50.
1 2 3 4 5 | # Example data
data(exSCE)
# Connect states
exSCE <- connectStates(exSCE, l=30)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.