dreval  R Documentation 
Calculates a collection of metrics comparing one or more reduced dimension
representations to a reference representation. The function takes a
SingleCellExperiment
object as input. The reference representation can
be either one of the included assays or one of the reduced dimension
representations. If an assay is used, reference distances can be calculated
based on all or a subset of the features (rows). These distances are then
compared to distances calculated from the specified reduced dimension
representations, and several scores are returned. The execution time of the
function depends strongly on both the number of retained variables (which
affects the distance calculation in the reference space) and the number of
samples that are randomly selected to use as the basis for the comparison.
Since subsampling of the columns (via the nSamples
argument) is
random, setting the random seed is recommended to obtain reproducible
results.
dreval(
sce,
dimReds = NULL,
refType = "assay",
refAssay = "logcounts",
refDimRed = NULL,
features = NULL,
nSamples = NULL,
distNorm = "none",
refDistMethod = "euclidean",
kTM = c(10, 100),
labelColumn = NULL,
verbose = FALSE
)
sce 
A 
dimReds 
A character vector with the names of the reduced dimension
representations from 
refType 
A character scalar, either "assay" or "dimred", specifying
whether to use an assay or a reduced dimension representation of 
refAssay 
A character scalar giving the name of the assay from

refDimRed 
A character scalar specifying the reduced dimension
representation to use as the reference data representation if

features 
A character vector giving the IDs of the features to use for
distance calculations from the chosen assay. Will be matched to the row
names of 
nSamples 
A numeric scalar, giving the number of columns to subsample
(randomly) from 
distNorm 
A character scalar, indicating how the distance vectors in the reference and lowdimensional spaces should be normalized before they are compared. If set to "l2", the vectors are L2 normalized, if set to "median" they are divided by the median value times the square root of their length, and if set to any other value they are divided by the square root of their length, to avoid metrics scaling with the number of retained samples. 
refDistMethod 
A character scalar defining the distance measure to use in the reference space. Must be one of "euclidean", "manhattan", "maximum", "canberra" or "cosine". The distance in the lowdimensional representation will always be Euclidean. 
kTM 
An integer vector giving the number of neighbors to use for trustworthiness, continuity and Jaccard index calculations. 
labelColumn 
A character scalar defining a column of

verbose 
A logical scalar, indicating whether to print out progress messages. 
The following metrics are calculated:
SpearmanCorrDist  The Spearman correlation between the reference distances and the Euclidean distances in the lowdimensional representation. Ranges from 1 to 1, higher values are better.
PearsonCorrDist  The Pearson correlation between the reference distances and the Euclidean distances in the lowdimensional representation. Ranges from 1 to 1, higher values are better.
KSstatDist  The KolmogorovSmirnov statistic comparing the distribution of distances in the reference space and in the lowdimensional representation. Ranges from 0 to 1, lower values are better.
EuclDistBetweenDists  The Euclidean distance between the vector of
distances in the reference space and those in the lowdimensional
representation. Depending on the value of distNorm
, distances are
scaled before they are compared. Lower values are better.
SammonStress  The Sammon stress (Sammon 1969). Depending on the
value of distNorm
, distances are scaled before they are compared.
Lower values are better.
Trustworthiness_kNN  The trustworthiness score (Venna & Kaski 2001), using NN nearest neighbors. The trustworthiness indicates to which degree we can trust that the points placed closest to a given sample in the lowdimensional representation are really close to the sample also in the reference space. Ranges from 0 to 1, higher values are better.
Continuity_kNN  The continuity score (Venna & Kaski 2001), using NN nearest neighbors. The continuity indicates to which degree we can trust that the points closest to a given sample in the reference space are placed close to the sample also in the lowdimensional representation. Ranges from 0 to 1, higher values are better.
MeanJaccard_kNN  The mean Jaccard index (over all samples), comparing the set of NN nearest neighbors in the reference space and those in the lowdimensional representation. Ranges from 0 to 1, higher values are better.
MeanSilhouette_X  If a labelColumn
X is supplied, the mean
silhouette score (Rousseeuw 1987) across all samples, with the grouping
given by this column and the distances obtained from the lowdimensional
representation. Ranges from 1 to 1, higher values are better.
coRankingQlocal  Q_local, defined as the average LCMC over the values to the left of the maximum, following the dimRed/coRanking package implementations (Kraemer et al 2018, Lee and Verleysen 2009, Chen and Buja 2009). Measures the preservation of local distances, higher values are better.
coRankingQglobal  Q_global, defined as the average LCMC over the values to the right of the maximum, following the dimRed/coRanking package implementations (Kraemer et al 2018, Lee and Verleysen 2009, Chen and Buja 2009). Measures the preservation of global distances, higher values are better.
A list with two elements:
scores  A data.frame
with values of all evaluation metrics,
across the dimension reduction methods. In addition to the metrics, it
contains the dimensionality of the respective reduced dimension
representations, and the value of K giving the highest value of LCMC (used
for the calculations of Qlocal and Qglobal, see Kraemer et al 2018, Lee and
Verleysen 2009, Chen and Buja 2009).
plots  A list of ggplot objects, representing diagnostic plots.
Charlotte Soneson
Venna J., Kaski S. (2001). Neighborhood preservation in nonlinear projection methods: An experimental study. In Dorffner G., Bischof H., Hornik K., editors, Proceedings of ICANN 2001, pp 485–491. Springer, Berlin.
Lee J.A., Verleysen M. (2009). Quality assessment of dimensionality reduction: Rankbased criteria. Neurocomputing 72 (79):14311443.
Chen L., Buja A. (2009). Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. Journal of the American Statistical Association 104:209219.
Kraemer G., Reichstein M., Mahecha M.D. (2018). dimRed and coRanking  Unifying dimensionality reduction in R. The R Journal 10 (1):342358.
Sammon J.W. Jr (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers C18(5):401409.
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20:5365.
data(pbmc3ksub)
dre < dreval(sce = pbmc3ksub, nSamples = 150)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.