Nuclei Analysis Toolkit

library(DT)
library(utils)
library(UpSetR)
knitr::opts_chunk$set(echo = FALSE, warning = FALSE)
knitr::knit_hooks$set(wrap = function(before, options, envir) {
  if (before) {
    paste0("<", options$wrap, ' align="center">')
  } else {
    paste0("</", options$wrap, ">")
  }
})

metadata <- qs::qread(params$metadata_path)

Dataset Integration

Datasets were integrated using the method 'Linked Inference of Genomic Experimental Relationships (LIGER)' [@Welch2019]. This involved four pre-processing steps: (1) normalization for UMIs per cell, (2) subsetting the most variable genes for each dataset, (3) scaling by root-mean-square across cells, and (4) filtering of non-expressive genes.

Key pre-processing parameters:

Number of variable genes per dataset (individual) selected for integration: r metadata$liger_params$liger_preprocess$num_genes
Total number of variable genes used for integration (the union across all individuals): r length(metadata$liger_var.genes)

Note: The length of the union across datasets (individuals) varies. The UpSet plots below may reveal outlying dataset/s.

knitr::opts_chunk$set(echo = FALSE)
print(metadata$dataset_integration$var.genes_plots$upset)

LIGER Factorization

An integrative non-negative matrix factorization was performed in order to identify shared and distinct metagenes (factors) across the datasets. The corresponding factor/metagene loadings were calculated for each cell.

Key factorization parameters:

Number of factors (inner dimension of factorization; k): r metadata$liger_params$liger_reduce_dims$k
Penalty parameter which limits the dataset-specific component of the factorization (lambda): r metadata$liger_params$liger_reduce_dims$lambda
Resolution parameter which controls the number of communities detected: r metadata$liger_params$liger_reduce_dims$resolution

Batch Effect Correction by LIGER

The performance of LIGER in batch effect correction was evaluated by comparison with dataset without a data integration algorithm applied (i.e. PCA input for dimensionality reduction). For each of the categorical covariates specified by the user two side-by-side comparisons have been represented: (1) visualisation of the batch effect using tSNE plots, and (2) quantification of the batch effect based on kBET test results [@Büttner2019].

In each kBET plot, the rejection rate represents the fraction of neighbourhoods with a label composition different from the global composition of batch labels. A significantly different observed vs. expected rejection rate opposes the well-mixedness of the data.

Categorical covariates

knitr::opts_chunk$set(echo = FALSE)

show_plot <- function(plot_object) {
  div(style="margin:auto;text-align:center", plot_object)
}

out <- NULL
for(variable in names(metadata$dataset_integration$batch_correction_plots[[1]])) {
    out <- c(out, knitr::knit_child("categorical_covariates.Rmd"))
}
paste(out, collapse='\n')