findStates: Identify trajectory states

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Determines states using hierarchical spectral clustering with a post-hoc test.

Usage

1
findStates(sce, min_size = 0.01, min_feat = 5, max_pval = 1e-04, min_fc = 2)

Arguments

sce

A SingleCellExperiment object

min_size

The initial cluster dedrogram is cut at an height such that the minimum cluster size is at least min_size; if min_size < 1 than the fraction of total samples is used, otherwise it is used as absoulte count (default: 0.01).

min_feat

Minimum number of differentially expressed features between siblings. If this number is not reached, two neighboring clusters (siblings) in the pruned dendrogram get joined. (default: 5)

max_pval

Maximum P-value for differential expression computation. (default: 1e-4)

min_fc

Mimimum fold-change for differential expression computation (default: 2)

Details

To identify cellular subpopulations, CellTrails performs hierarchical clustering via minimization of a square error criterion (Ward, 1963) in the lower-dimensional space. To determine the cardinality of the clustering, CellTrails conducts an unsupervised post-hoc analysis. Here, it is assumed that differential expression of assayed features determines distinct cellular stages. First, Celltrails identifies the maximal fragmentation of the data space, i.e. the lowest cutting height in the clustering dendrogram that ensured that the resulting clusters contained at least a certain fraction of samples. Then, processing from this height towards the root, CellTrails iteratively joins siblings if they did not have at least a certain number of differentially expressed features. Statistical significance is tested by means of a two-sample non-parametric linear rank test accounting for censored values (Peto & Peto, 1972). The null hypothesis is rejected using the Benjamini-Hochberg (Benjamini & Hochberg, 1995) procedure for a given significance level.
Since this methods performs pairwise comparisons, the fold change threshold value is valid in both directions: higher and lower expressed than min_fc. Thus, input values < 0 are interpreted as a fold-change of 0. For example, min_fc=2 checks for features that are 2-fold differentially expressed in two given states (e.g., S1, S2). Thus, a feature can be either 2-fold higher expressed in state S1 or two-fold lower expressed in state S2 to be validated as differentially expressed.
Please note that this methods only uses the set of defined trajectory features in a SingleCellExperiment object; spike-in controls are ignored and are not listed as trajectory features.

Diagnostic messages

An error is thrown if the samples stored in the SingleCellExperiment object were not embedded yet (ie. the SingleCellExperiment object does not contain a latent space matrix object; latentSpace(object)is NULL).

Value

A factor vector

Author(s)

Daniel C. Ellwanger

References

Ward, J.H. (1963). Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 58, 236-244.

Peto, R., and Peto, J. (1972). Asymptotically Efficient Rank Invariant Test Procedures (with Discussion). Journal of the Royal Statistical Society of London, Series A 135, 185–206.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.

See Also

latentSpace trajectoryFeatureNames

Examples

1
2
3
4
5
6
# Example data
data(exSCE)

# Find states
cl <- findStates(exSCE, min_feat=2)
head(cl)

elldc/CellTrails documentation built on May 16, 2020, 4:40 a.m.