Setup

library('BiocStyle')
knitr::opts_chunk$set( autodep=TRUE, cache=FALSE, dev='png' )
library('SampleQC')
library('patchwork')
# define what to use and annotate
group_list  = metadata(qc_obj)$group_list
n_groups    = metadata(qc_obj)$n_groups

Checking for outliers and QC batches via MMD

Maximum mean discrepancy (MMD) is a measure of dissimilarity of empirical (possibly multivariate) distributions. If $X$ and $Y$ are sampled from distributions $D_x$ and $D_y$, then $E(MMD(X,Y)) = 0$ if and only if $D_x = D_y$. SampleQC uses MMD to estimate similarities between the QC matrices of samples in a experiment. Viewed as equivalent to a distance, SampleQC uses the MMD values as input to multiple non-linear embedding approaches, and for clustering. This then allows users to identify possible batch effects in the samples, and groupings of samples which have similar distributions of QC metrics.

Plot MMD dissimilarity matrix

Heatmap of all pairwise dissimilarities between samples (values close to 0 indicate similar samples; values of 1 and higher indicate extremely dissimilar samples).

(plot_mmd_heatmap(qc_obj))

Plot over UMAP embedding with annotations{.tabset}

UMAP embedding of dissimilarity matrix, annotated with selected discrete and continuous values for each sample.

plot_embeddings(qc_obj, "discrete", "UMAP")
plot_embeddings(qc_obj, "continuous", "UMAP")

Plot over MDS embedding with annotations{.tabset}

Multidimensional scaling (MDS) embedding of dissimilarity matrix, annotated with selected discrete and continuous values for each sample.

plot_embeddings(qc_obj, "discrete", "MDS")
plot_embeddings(qc_obj, "continuous", "MDS")

Plot SampleQC model fits and outliers over QC biaxials

These plots show biaxial distributions of each sample, annotated with both the fitted mean and covariance matrices, and the cells which are then identified as outliers. You can use this to check that you have the correct number of components for each sample grouping, and to check that the fitting procedure has worked properly. The means and covariances of the components should match up to the densest parts of the biaxial plots.

for (g in group_list) {
    cat('## ', g, '{.tabset}\n')
    # which samples?
    samples_g   = sort(colData(qc_obj)$sample_id[ colData(qc_obj)$group_id == g ])
    for (s in samples_g) {
        cat('### ', s, ' \n')
        g_fit   = plot_fit_over_biaxials(qc_obj, s)
        g_out   = plot_outliers(qc_obj, s)
        g       = g_fit / g_out
        print(g)
        cat('\n\n')
    }
}

Plot parameters

These plots show the fitted parameters for each sample and each mixture component. There are two sets of parameters: $\alpha_j$, the mean shift for each sample; and $(\mu_k, \Sigma_k)$, the relative means and covariances for each mixture component.

$\alpha_j$ values{.tabset}

Values of $\alpha_j$ which are extreme relative to those for most other samples indicate samples which are either outliers in terms of their QC statistics, or have been allocated to the wrong sample grouping.

# plot likelihoods
for (g in group_list) {
    cat('### ', g, '\n')
    print(plot_alpha_js_likelihoods(qc_obj, g))
    cat('\n\n')
}

These plots show the same $\alpha_j$ values, but as biaxials, and equivalently for PCA projections.

$\alpha_j$ PCA values{.tabset}

for (g in group_list) {
    cat('### ', g, ' feats\n')
    print(plot_alpha_js(qc_obj, g, qc_idx=1:2, pc_idx=1:2))
    cat('\n\n')
    cat('### ', g, ' mito\n')
    print(plot_alpha_js(qc_obj, g, qc_idx=c(1,3), pc_idx=c(1,3)))
    cat('\n\n')
}

These plots show the composition of each sample in terms of the $K$ different mixture components, plus outliers.

Splits across components{.tabset}

# plot likelihoods
for (g in group_list) {
    cat('### ', g, '\n')
    print(plot_beta_ks(qc_obj, g))
    cat('\n\n')
}


wmacnair/SampleQC documentation built on Nov. 18, 2022, 5 p.m.