slingshot | R Documentation |
Perform trajectory inference with Slingshot
Perform trajectory inference by (1) identifying lineage
structure with a cluster-based minimum spanning tree, and (2) constructing
smooth representations of each lineage using simultaneous principal curves.
This function wraps the getLineages
and
getCurves
functions and is the primary function of the
slingshot
package.
slingshot(data, clusterLabels, ...)
## S4 method for signature 'matrix,character'
slingshot(
data,
clusterLabels,
reducedDim = NULL,
start.clus = NULL,
end.clus = NULL,
dist.method = "slingshot",
use.median = FALSE,
omega = FALSE,
omega_scale = 1.5,
times = NULL,
shrink = TRUE,
extend = "y",
reweight = TRUE,
reassign = TRUE,
thresh = 0.001,
maxit = 15,
stretch = 2,
approx_points = NULL,
smoother = "smooth.spline",
shrink.method = "cosine",
allow.breaks = TRUE,
...
)
## S4 method for signature 'matrix,matrix'
slingshot(
data,
clusterLabels,
reducedDim = NULL,
start.clus = NULL,
end.clus = NULL,
dist.method = "slingshot",
use.median = FALSE,
omega = FALSE,
omega_scale = 1.5,
times = NULL,
shrink = TRUE,
extend = "y",
reweight = TRUE,
reassign = TRUE,
thresh = 0.001,
maxit = 15,
stretch = 2,
approx_points = NULL,
smoother = "smooth.spline",
shrink.method = "cosine",
allow.breaks = TRUE,
...
)
## S4 method for signature 'SlingshotDataSet,ANY'
slingshot(data, clusterLabels, ...)
## S4 method for signature 'data.frame,ANY'
slingshot(data, clusterLabels, ...)
## S4 method for signature 'matrix,numeric'
slingshot(data, clusterLabels, ...)
## S4 method for signature 'matrix,factor'
slingshot(data, clusterLabels, ...)
## S4 method for signature 'matrix,ANY'
slingshot(data, clusterLabels, ...)
## S4 method for signature 'ClusterExperiment,ANY'
slingshot(
data,
clusterLabels,
reducedDim = NULL,
start.clus = NULL,
end.clus = NULL,
dist.method = "slingshot",
use.median = FALSE,
omega = FALSE,
omega_scale = 1.5,
times = NULL,
shrink = TRUE,
extend = "y",
reweight = TRUE,
reassign = TRUE,
thresh = 0.001,
maxit = 15,
stretch = 2,
approx_points = NULL,
smoother = "smooth.spline",
shrink.method = "cosine",
allow.breaks = TRUE,
...
)
## S4 method for signature 'SingleCellExperiment,ANY'
slingshot(
data,
clusterLabels,
reducedDim = NULL,
start.clus = NULL,
end.clus = NULL,
dist.method = "slingshot",
use.median = FALSE,
omega = FALSE,
omega_scale = 1.5,
times = NULL,
shrink = TRUE,
extend = "y",
reweight = TRUE,
reassign = TRUE,
thresh = 0.001,
maxit = 15,
stretch = 2,
approx_points = NULL,
smoother = "smooth.spline",
shrink.method = "cosine",
allow.breaks = TRUE,
...
)
data |
a data object containing the matrix of coordinates to be used for
lineage inference. Supported types include |
clusterLabels |
each cell's cluster assignment. This can be a single
vector of labels, or a |
... |
Additional parameters to pass to scatter plot smoothing function,
|
reducedDim |
(optional) the dimensionality reduction to be used. Can be
a matrix or a character identifying which element of
|
start.clus |
(optional) character, indicates the starting cluster(s) from which lineages will be drawn. |
end.clus |
(optional) character, indicates which cluster(s) will be forced to be leaf nodes in the graph. |
dist.method |
(optional) character, specifies the method for calculating
distances between clusters. Default is |
use.median |
logical, whether to use the median (instead of mean) when calculating cluster centroid coordinates. |
omega |
(optional) numeric, this granularity parameter determines the
distance between every real cluster and the artificial cluster,
|
omega_scale |
(optional) numeric, scaling factor to use when |
times |
numeric, vector of external times associated with either
clusters or cells. See |
shrink |
logical or numeric between 0 and 1, determines whether and how
much to shrink branching lineages toward their average prior to the split
(default |
extend |
character, how to handle root and leaf clusters of lineages
when constructing the initial, piece-wise linear curve. Accepted values are
|
reweight |
logical, whether to allow cells shared between lineages to be
reweighted during curve fitting. If |
reassign |
logical, whether to reassign cells to lineages at each
iteration. If |
thresh |
numeric, determines the convergence criterion. Percent change
in the total distance from cells to their projections along curves must be
less than |
maxit |
numeric, maximum number of iterations (default |
stretch |
numeric factor by which curves can be extrapolated beyond
endpoints. Default is |
approx_points |
numeric, whether curves should be approximated by a
fixed number of points. If |
smoother |
choice of scatter plot smoother. Same as
|
shrink.method |
character denoting how to determine the appropriate
amount of shrinkage for a branching lineage. Accepted values are the same
as for |
allow.breaks |
logical, determines whether curves that branch very close to the origin should be allowed to have different starting points. |
Given a reduced-dimensional data matrix n
by p
and a
vector of cluster labels (or matrix of soft cluster assignments,
potentially including a -1
label for "unclustered"), this function
performs trajectory inference using a cluster-based minimum spanning tree
on the clusters and simultaneous principal curves for smooth, branching
paths.
The graph of this structure is learned by fitting a (possibly
constrained) minimum-spanning tree on the clusters, plus the artificial
cluster, .OMEGA
, which is a fixed distance away from every real
cluster. This effectively limits the maximum branch length in the MST to
the chosen distance, meaning that the output may contain multiple trees.
Once the graph is known, lineages are identified in any tree with at least two clusters. For a given tree, if there is an annotated starting cluster, every possible path out of a starting cluster and ending in a leaf that isn't another starting cluster will be returned. If no starting cluster is annotated, one will be chosen by a heuristic method, but this is not recommended.
When there is only a single lineage, the curve-fitting algorithm is
nearly identical to that of principal_curve
. When
there are multiple lineages and shrink > 0
, an additional step
is added to the iterative procedure, forcing curves to be similar in the
neighborhood of shared points (ie., before they branch).
The approx_points
argument, which sets the number of points
to be used for each curve, can have a large effect on computation time. Due
to this consideration, we set the default value to 150
whenever the
input dataset contains more than that many cells. This setting should help
with exploratory analysis while having little to no impact on the final
curves. To disable this behavior and construct curves with the maximum
number of points, set approx_points = FALSE
.
The extend
argument determines how to construct the
piece-wise linear curve used to initiate the recursive algorithm. The
initial curve is always based on the lines between cluster centers and if
extend = 'n'
, this curve will terminate at the center of the
endpoint clusters. Setting extend = 'y'
will allow the first and
last segments to extend beyond the cluster center to the orthogonal
projection of the furthest point. Setting extend = 'pc1'
is similar
to 'y'
, but uses the first principal component of the cluster to
determine the direction of the curve beyond the cluster center. These
options typically have limited impact on the final curve, but can
occasionally help with stability issues.
When shink = TRUE
, we compute a percent shrinkage curve,
w_l(t)
, for each lineage, a non-increasing function of pseudotime
that determines how much that lineage should be shrunk toward a shared
average curve. We set w_l(0) = 1
(complete shrinkage), so that the
curves will always perfectly overlap the average curve at pseudotime
0
. The weighting curve decreases from 1
to 0
over the
non-outlying pseudotime values of shared cells (where outliers are defined
by the 1.5*IQR
rule). The exact shape of the curve in this region is
controlled by shrink.method
, and can follow the shape of any
standard kernel function's cumulative density curve (or more precisely,
survival curve, since we require a decreasing function). Different choices
of shrink.method
to have no discernable impact on the final curves,
in most cases.
When reweight = TRUE
, weights for shared cells are based on
the quantiles of their projection distances onto each curve. The
distances are ranked and converted into quantiles between 0
and
1
, which are then transformed by 1 - q^2
. Each cell's weight
along a given lineage is the ratio of this value to the maximum value for
this cell across all lineages.
An object of class PseudotimeOrdering
containing the
pseudotime estimates and lineage assignment weights in the assays
.
The reducedDim
and clusterLabels
matrices will be stored in
the cellData
. Additionally, the
metadata
slot will contain an igraph
object
named mst
, a list of parameter values named slingParams
, a
list of lineages (ordered sets of clusters) named lineages
, and a
list of principal_curve
objects named
curves
.
Hastie, T., and Stuetzle, W. (1989). "Principal Curves." Journal of the American Statistical Association, 84:502-516.
Street, K., et al. (2018). "Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics." BMC Genomics, 19:477.
data("slingshotExample")
rd <- slingshotExample$rd
cl <- slingshotExample$cl
pto <- slingshot(rd, cl, start.clus = '1')
# plotting
sds <- as.SlingshotDataSet(pto)
plot(rd, col = cl, asp = 1)
lines(sds, type = 'c', lwd = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.