Description Usage Arguments Details Value Author(s) References See Also Examples
Non-linear learning of a data representation that captures the intrinsic geometry of the trajectory. This function performs spectral decomposition of a graph encoding conditional entropy-based sample-to-sample similarities.
1 2 3 4 | embedSamples(x, design = NULL)
## S4 method for signature 'matrix'
embedSamples(x, design = NULL)
|
x |
A |
design |
A numeric matrix describing the factors that should be blocked |
Single-cell gene expression measurements comprise high-dimensional
data of large volume, i.e. many features (e.g., genes) are measured in many
samples (e.g., cells); or more formally, m samples can be described
by the expression of n features (i.e., n dimensions). The
cells’ expression profiles are shaped by many distinct unobserved biological
causes related to each cell's geno- and phenotype, such as developmental
age, tissue region of origin, cell cycle stage, as well as extrinsic sources
such as status of signaling receptors, and environmental stressors, but also
technical noise. In other words, a single dimension, despite just containing
gene expression information, represents an underlying combination of multiple
dependent and independent, relevant and non-relevant factors, whereat each
factors’ individual contribution is non-uniform. To obtain a better
resolution and to extract underlying information, CellTrails aims to find a
meaningful low-dimensional structure - a manifold - that represents cells
mainly by their temporal relation along a biological process.
This method assumes that the expression vectors are lying on or near a
manifold with dimensionality d that is embedded in the
n-dimensional space. By using spectral embedding CellTrails aims to
amplify latent temporal information; it reduces noise (ie. truncates
non-relevant dimensions) by transforming the expression matrix into a new
dataset while retaining the geometry of the original dataset as much as
possible.CellTrails captures overall cell-to-cell relations based on the
statistical mutual dependency between any two data vectors. A high
dependency between two samples should be represented by their close
proximity in the lower-dimensional space.
First, the mutual depencency between samples is scored using mutual
information. This entropy framework naturally requires discretization
of data vectors by an indicator function, which assigns each continuous
data point (expression value) to exactly one discrete interval (e.g. low,
mid or high). However, measurement points located close to the interval
borders may get wrongly assigned due to noise-induced fluctuations.
Therefore, CellTrails fuzzifies the indicator function by using a piecewise
polynomial function, i.e. the domain of each sample expression vector is
divided into contiguous intervals (based on Daub et al., 2004).
Second, the computed mutual information matrix, which is left-bounded and
composed of bits, is scaled to a generalized correlation coefficient. Third,
CellTrails constructs a simple complete graph with m nodes, one for
each data vector (ie. sample), and weights each edge between two nodes by a
heat kernel function applied on the generalzied correlation coefficient.
Finally, nonlinear spectral embedding (ie. spectral decomposition of the
graph's adjacency matrix) is performed
(Belkin & Niyogi, 2003; Sussman et al., 2012) unfolding the manifold.
Please note that this methods only uses the set of defined trajectory
features in a SingleCellExperiment
object; spike-in controls are
ignored and are not listed as trajectory features.
To account for systematic bias in the expression data
(e.g., cell cycle effects), a design matrix can be
provided for the learning process. It should list the factors that should be
blocked and their values per sample. It is suggested to construct a
design matrix with model.matrix
.
Diagnostic messages
The method throws an error if expression matrix contains samples
with zero entropy (e.g., the samples exclusively contain non-detects, that
is all expression values are zero).
A list containing the following components:
|
Ordered components of latent space |
|
Information content of latent components |
Daniel C. Ellwanger
Daub, C.O., Steuer, R., Selbig, J., and Kloska, S. (2004). Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5, 118.
Belkin, M., and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15, 1373-1396.
Sussman, D.L., Tang, M., Fishkind, D.E., and Priebe, C.E. (2012). A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs. J Am Stat Assoc 107, 1119-1128.
SingleCellExperiment
trajectoryFeatureNames
model.matrix
1 2 3 4 5 | # Example data
data(exSCE)
# Embed samples
res <- embedSamples(exSCE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.