slingshot: Perform lineage inference with Slingshot

Description Usage Arguments Details Value References Examples

Description

Perform lineage inference with Slingshot

Given a reduced-dimensional data matrix n by p and a vector of cluster labels (or matrix of soft cluster assignments, potentially including a -1 label for "unclustered"), this function performs lineage inference using a cluster-based minimum spanning tree and constructing simultaneous principal curves for branching paths through the tree.

This wrapper function performs lineage inference in two steps: (1) identify lineage structure with a cluster-based minimum spanning tree with the getLineages function and (2) construct smooth representations of each lineage using simultaneous principal curves from the function getCurves.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
slingshot(data, clusterLabels, ...)

## S4 method for signature 'matrix,character'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'matrix,matrix'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'SlingshotDataSet,ANY'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'data.frame,ANY'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'matrix,numeric'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'matrix,factor'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'matrix,ANY'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'ClusterExperiment,ANY'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

## S4 method for signature 'SingleCellExperiment,ANY'
slingshot(
  data,
  clusterLabels,
  reducedDim = NULL,
  start.clus = NULL,
  end.clus = NULL,
  dist.fun = NULL,
  omega = NULL,
  omega_scale = 3,
  lineages = list(),
  shrink = TRUE,
  extend = "y",
  reweight = TRUE,
  reassign = TRUE,
  thresh = 0.001,
  maxit = 15,
  stretch = 2,
  approx_points = FALSE,
  smoother = "smooth.spline",
  shrink.method = "cosine",
  allow.breaks = TRUE,
  ...
)

Arguments

data

a data object containing the matrix of coordinates to be used for lineage inference. Supported types include matrix, SingleCellExperiment, and SlingshotDataSet.

clusterLabels

character, a vector of length n denoting cluster labels, optionally including -1's for "unclustered." If reducedDim is a SlingshotDataSet, cluster labels will be taken from it.

...

Additional parameters to pass to scatter plot smoothing function, smoother.

reducedDim

(optional) identifier to be used if reducedDim(data) contains multiple elements. Otherwise, the first element will be used by default.

start.clus

(optional) character, indicates the cluster(s) of origin. Lineages will be represented by paths coming out of this cluster.

end.clus

(optional) character, indicates the cluster(s) which will be forced leaf nodes. This introduces a constraint on the MST algorithm.

dist.fun

(optional) function, method for calculating distances between clusters. Must take two matrices as input, corresponding to subsets of reducedDim. If the minimum cluster size is larger than the number dimensions, the default is to use the joint covariance matrix to find squared distance between cluster centers. If not, the default is to use the diagonal of the joint covariance matrix.

omega

(optional) numeric, this granularity parameter determines the distance between every real cluster and the artificial cluster, .OMEGA. In practice, this makes omega the maximum allowable distance between two connected clusters. By default, omega = Inf. If omega = TRUE, the maximum edge length will be set to the median edge length of the unsupervised MST times a scaling factor (omega_scale, default = 3). This value is provided as a potentially useful rule of thumb for datasets with outlying clusters or multiple, distinct trajectories, but it is not otherwise recommended.

omega_scale

(optional) numeric, scaling factor to use when omega = TRUE. The maximum edge length will be set to the median edge length of the unsupervised MST times omega_scale (default = 3).

lineages

list generated by getLineages, denotes lineages as ordered sets of clusters and contains the K x K connectivity matrix constructed on the clusters by getLineages.

shrink

logical or numeric between 0 and 1, determines whether and how much to shrink branching lineages toward their average prior to the split.

extend

character, how to handle root and leaf clusters of lineages when constructing the initial, piece-wise linear curve. Accepted values are 'y' (default), 'n', and 'pc1'. See 'Details' for more.

reweight

logical, whether to allow cells shared between lineages to be reweighted during curve-fitting. If TRUE, cells shared between lineages will be iteratively reweighted based on the quantiles of their projection distances to each curve. See 'Details' for more.

reassign

logical, whether to reassign cells to lineages at each iteration. If TRUE, cells will be added to a lineage when their projection distance to the curve is less than the median distance for all cells currently assigned to the lineage. Additionally, shared cells will be removed from a lineage if their projection distance to the curve is above the 90th percentile and their weight along the curve is less than 0.1.

thresh

numeric, determines the convergence criterion. Percent change in the total distance from cells to their projections along curves must be less than thresh. Default is 0.001, similar to principal_curve.

maxit

numeric, maximum number of iterations, see principal_curve.

stretch

numeric factor by which curves can be extrapolated beyond endpoints. Default is 2, see principal_curve.

approx_points

logical or numeric, whether curves should be approximated by a fixed number of points. If FALSE, no approximation will be performed and curves will contain as many points as the input data. If numeric, curves will be approximated by this number of points; preferably about 100 (see principal_curve).

smoother,

choice of scatter plot smoother. Same as principal_curve, but "lowess" option is replaced with "loess" for additional flexibility.

shrink.method

character denoting how to determine the appropriate amount of shrinkage for a branching lineage. Accepted values are the same as for kernel in density (default is "cosine"), as well as "tricube" and "density". See 'Details' for more.

allow.breaks

logical, determines whether curves that branch very close to the origin should be allowed to have different starting points.

Details

The connectivity matrix is learned by fitting a (possibly constrained) minimum-spanning tree on the clusters and the artificial cluster, .OMEGA, which is a fixed distance away from every real cluster. This effectively limits the maximum branch length in the MST to the chosen distance, meaning that the output may contain multiple trees.

Once the connectivity is known, lineages are identified in any tree with at least two clusters. For a given tree, if there is an annotated starting cluster, every possible path out of a starting cluster and ending in a leaf that isn't another starting cluster will be returned. If no starting cluster is annotated, every leaf will be considered as a potential starting cluster and whichever configuration produces the longest average lineage length (in terms of number of clusters included) will be returned.

When there is only a single lineage, the curve-fitting algorithm is nearly identical to that of principal_curve. When there are multiple lineages and shrink == TRUE, an additional step is added to the iterative procedure, forcing curves to be similar in the neighborhood of shared points (ie., before they branch).

The extend argument determines how to construct the piece-wise linear curve used to initiate the recursive algorithm. The initial curve is always based on the lines between cluster centers and if extend = 'n', this curve will terminate at the center of the endpoint clusters. Setting extend = 'y' will allow the first and last segments to extend beyond the cluster center to the orthogonal projection of the furthest point. Setting extend = 'pc1' is similar to 'y', but uses the first principal component of the cluster to determine the direction of the curve beyond the cluster center. These options typically have little to no impact on the final curve, but can occasionally help with stability issues.

When shink == TRUE, we compute a shrinkage curve, w_l(t), for each lineage, a non-increasing function of pseudotime that determines how much that lineage should be shrunk toward a shared average curve. We set w_l(0) = 1, so that the curves will perfectly overlap the average curve at pseudotime 0. The weighting curve decreases from 1 to 0 over the non-outlying pseudotime values of shared cells (where outliers are defined by the 1.5*IQR rule). The exact shape of the curve in this region is controlled by shrink.method, and can follow the shape of any standard kernel function's cumulative density curve (or more precisely, survival curve, since we require a decreasing function). Different choices of shrink.method seem to have little impact on the final curves, in most cases.

When reweight = TRUE, weights for shared cells are based on the quantiles of their projection distances onto each curve. The distances are ranked and converted into quantiles between 0 and 1, which are then transformed by 1 - q^2. Each cell's weight along a given lineage is the ratio of this value to the maximum value for this cell across all lineages.

Value

An object of class SlingshotDataSet containing the arguments provided to slingshot as well as the following output:

References

Hastie, T., and Stuetzle, W. (1989). "Principal Curves." Journal of the American Statistical Association, 84:502–516.

Examples

1
2
3
4
5
6
7
data("slingshotExample")
rd <- slingshotExample$rd
cl <- slingshotExample$cl
sds <- slingshot(rd, cl, start.clus = '1')

plot(rd, col = cl, asp = 1)
lines(sds, lwd = 3)

slingshot documentation built on Nov. 8, 2020, 5:51 p.m.