ggplot2::theme_set(ggplot2::theme_bw(base_size = 14) + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)))
library(dropestr) data('lq_cells_data') pipeline_data <- lq_cells_data$pipeline mit_genes <- lq_cells_data$genes
The simplest way to score the cells for data, obtained with the pipeline:
scores <- ScorePipelineCells(pipeline_data) PlotCellScores(scores, y.threshold=0.9)
Cells with high mitochondrial fraction are probably dead, so it's reasonable to filter them out. There are two ways of distinguishing of mitochondrial reads: by chromosome name and by genesets. The first approach estimates the fraction of reads, while the second works with UMIs. However, results are quite similar.
scores_chromosome_filt <- ScorePipelineCells(pipeline_data, mit.chromosome.name='chrM') scores_geneset_filt <- ScorePipelineCells(pipeline_data, mitochondrion.genes=mit_genes) PlotCellScores(scores_chromosome_filt, y.threshold=0.9, main='Chromosome') PlotCellScores(scores_geneset_filt, y.threshold=0.9, main='Geneset')
Answers are the same for r round(mean((scores_chromosome_filt > 0.9) == (scores_geneset_filt > 0.9)) * 100, 2)
% cells.
This filtration can be done manually in more flexible way. The first step is feature extraction from existed data.
lq_cells_df <- PrepareLqCellsDataPipeline(pipeline_data, mitochondrion.genes=mit_genes)
Next, we need to estimate approximate number of real cells. It can be done using one of the following plots, each of which shows the expected number of cells, however for different datasets some of them can give more precise result than the other:
PlotCellsNumberLogLog(pipeline_data$aligned_umis_per_cell, estimate.cells.number=T) PlotCellsNumberLine(pipeline_data$aligned_umis_per_cell, estimate.cells.number=T) PlotCellsNumberHist(pipeline_data$aligned_umis_per_cell, estimate.cells.number=T)
Let's look at all the features we use. Please, keep in mind that all features are sacled to [0, 1] interval.
for (n in names(lq_cells_df)) { smoothScatter(lq_cells_df[[n]], xlab = "Cell rank", ylab = n, main = n) }
We can see that high mitochondrial fraction doesn't help to distinguish right tail from the left, so we can manually filter it and remove this feature:
lq_cells_df <- lq_cells_df[lq_cells_df$MitochondrionFraction < 0.1, ] lq_cells_df$MitochondrionFraction <- NULL
Also, based on what we see, we can decide to move border of "definitely low-quality" cells more to the right. The algoruthm is pretty robust to border selection though.
cells_number_manual <- list(min=450, max=800)
Finally, we're redy to get final score:
scores <- ScoreQualityData(pipeline_data$aligned_umis_per_cell, lq_cells_df, cells_number_manual) PlotCellScores(scores, y.threshold=0.9)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.