`r sampleId`

knitr::opts_chunk$set(echo = showCode, warning = FALSE, message = FALSE)
knitr::opts_knit$set(progress = FALSE, verbose = FALSE)
## Read input files
if (!quiet) message("Reading Alevin output files...")
alevin <- readAlevinQC(baseDir = baseDir, customCBList = customCBList)

Version info for alevin run

suppressWarnings({
  knitr::kable(
    alevin$versionTable
  )
})

Summary tables

Full set of cell barcodes

if (!quiet) message("Generating summary tables...")
suppressWarnings({
  knitr::kable(
    alevin$summaryTables$fullDataset
  )
})

Initial whitelist

suppressWarnings({
  knitr::kable(
    alevin$summaryTables$initialWhitelist
  )
})

Final whitelist

suppressWarnings({
  knitr::kable(
    alevin$summaryTables$finalWhitelist
  )
})
cat("## Custom cell barcode set(s)")
for (cbl in names(customCBList)) {
  cat(paste0("### ", cbl))
  suppressWarnings({
    print(knitr::kable(
      alevin$summaryTables[[paste0("customCB__", cbl)]]
    ))
  })
  cat("\n")
}

Knee plot

The knee plot displays the number of times each cell barcode is observed, in decreasing order. By finding a 'knee' in this plot, Alevin determines a threshold (indicated in the plot) that defines an initial 'whitelist' - a set of cell barcodes that likely represent non-empty droplets - and distinguishes them from the background. The initial whitelisting is only performed if no external whitelist is provided when running alevin. In the figure below, red indicates cell barcodes in the initial whitelist, black indicates all other cell barcodes.

if (!quiet) message("Generating knee plot...")
plotAlevinKneeRaw(alevin$cbTable)

Cell barcode error correction and merging with initial whitelist

Once the initial set of whitelisted cell barcodes is defined, Alevin goes through the remaining cell barcodes. If a cell barcode is similar enough to a whitelisted cell barcode, it will be corrected and the reads will be added to those of the whitelisted one. The figure below shows the original frequency of the whitelisted barcodes vs the frequency after this correction. The reads corresponding to cell barcodes that can not be corrected to a whitelisted barcode are discarded.

if (!quiet) message("Generating barcode collapsing plot...")
plotAlevinBarcodeCollapse(alevin$cbTable)

Quantification

After cell barcode collapsing, Alevin estimates the UMI count for each cell and gene. Following quantification, an additional cell barcode whitelisting is performed with the aim of extracting good quality cells, using not only the barcode frequency but also other features such as the fraction of mapped reads, the duplication rate and the average gene count. The plots below show the association between the cell barcode frequency (the number of observed reads corresponding to a cell barcode), the total UMI count and the number of detected genes. The cell barcodes are colored by whether or not they are included in the final whitelist.

These figures can give an indication of whether the sequenced reads actually align to genes, as well as the duplication rate and the degree of saturation. For many droplet data sets, the association between the barcode frequency and the total UMI count is rougly linear, while the association of any of these with the number of detected genes often deviates from linearity, if a small subset of the genes are assigned a large fraction of the UMI counts.

if (!quiet) message("Generating quantification summary plot...")
plotAlevinQuant(alevin$cbTable, colName = "inFinalWhiteList",
                cbName = "final whitelist")
cat("## Custom cell barcode set(s)")
for (cbl in names(customCBList)) {
  print(plotAlevinQuant(alevin$cbTable, colName = paste0("customCB__", cbl),
                        cbName = cbl))
}

Knee plot, number of detected genes

Similarly to the knee plot that was used to select the initial cell barcode whitelist, the plot below shows the number of detected genes for each cell barcode included in the initial whitelist, in decreasing order.

if (!quiet) message("Generating knee plot for nbr genes...")
plotAlevinKneeNbrGenes(alevin$cbTable)

Selected summary distributions

The histograms below show the distributions of the deduplication rates (number of deduplicated UMI counts/number of mapped reads) and the mapping rates, across the cells retained in the initial whitelist.

if (!quiet) message("Generating summary distribution plots...")
cowplot::plot_grid(
  plotAlevinHistogram(alevin$cbTable, plotVar = "dedupRate",
                      axisLabel = "Deduplication rate",
                      colName = "inFinalWhiteList",
                      cbName = "final whitelist"),
  plotAlevinHistogram(alevin$cbTable, plotVar = "mappingRate",
                      axisLabel = "Mapping rate",
                      colName = "inFinalWhiteList",
                      cbName = "final whitelist"),
  nrow = 1
)
cat("## Custom cell barcode set(s)")
for (cbl in names(customCBList)) {
  print(cowplot::plot_grid(
    plotAlevinHistogram(alevin$cbTable, plotVar = "dedupRate",
                        axisLabel = "Deduplication rate",
                        colName = paste0("customCB__", cbl),
                        cbName = cbl),
    plotAlevinHistogram(alevin$cbTable, plotVar = "mappingRate",
                        axisLabel = "Mapping rate",
                        colName = paste0("customCB__", cbl),
                        cbName = cbl),
    nrow = 1
  ))
}

Session info

sessionInfo()


Try the alevinQC package in your browser

Any scripts or data that you put into this service are public.

alevinQC documentation built on Feb. 4, 2021, 2:01 a.m.