ExpDesign2021: A helper to make sure requirements are installed for ExpDesign2021

title: "Project 2: Tim's stab at it" author: "Tim Triche" date: "November 10th, 2021" output: html_document: keep_md: true vignette: > %\VignetteIndexEntry{Project2_Tim} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc}

Tim's stab at classification and plotting

Recall that we are seeking to identify the influence of two factors on our ability to classify cells: the experimental vial and the library prep protocol. Since we are going to give in to dichotomania and label cells as "classifiable" or "not classifiable" at a given set of thresholds, we've got a logistic regression on our hands (i.e., we will be modeling the logit-transformed probability of classification, p, as the underlying response to conditions). Despite looking different from a linear model (which in the two-group case, you will recall, is tested for significance using a marginal t-test), we can rely upon a generalized version of residual variance called deviance to let us answer ANOVA-style questions: "how much variability is explained by a particular factor in the model?" First we need to set up our response variable, i.e., the ability to classify a cell confidently. For that, we need cutoffs.

bus format

The barcode, UMI, set or BUS format neatly encapsulates the process of transforming reads resulting from a single-cell RNAseq experiment into counts of molecular barcodes describing a cell, a molecule, and the likely genomic origin of that molecule. It's one of the most compact representations possible for raw scRNAseq outputs, and the figure shows how it facilitates direct comparisons. The Broad used their scumi tool instead, but the basic notion ("make everything comparable") and the resulting sparse matrix of counts is similar in spirit.

With the release of Bioconductor 3.14, the project includes a tidy single cell experiment (data structure) package, which is great since all other single cell data structures kind of suck. (No, seriously, you'll find out why eventually.) The package is, not coincidentally, called tidySingleCellExperiment:

Click for details


if (!require("SingleCellExperiment")) {
  BiocManager::install("SingleCellExperiment")
  library(SingleCellExperiment)
}
if (!require("tidySingleCellExperiment")) {
  BiocManager::install("tidySingleCellExperiment")
  library(tidySingleCellExperiment)
}

Sort-of-bonus: resampling

Above, I stated that maybe we don't need thousands of cells per method. You could always adjust the ideal argument to the function below to resample:

Click for sample_umis() function code


# adapted from a SingleCellExperiment-centric method for CITEseq
sample_umis <- function(umis, meta, block, ideal=300) {

  stopifnot(nrow(meta) == ncol(umis))
  stopifnot(length(block) == nrow(meta))

  pops <- sort(table(block))
  samplesets <- split(seq_len(nrow(meta)), block)

  keep <- integer()
  for (set in names(samplesets)) {
    sset <- samplesets[[set]]
    cells <- length(sset)
    if (cells <= ideal) {
      pct <- 100
      keep <- c(keep, sset)
      message("Kept ", cells, " cells (", pct, "%) of type ", set, ".")
    } else {
      kept <- sample(sset, size=ideal)
      pct <- round((ideal / cells) * 100)
      keep <- c(keep, kept)
      message("Kept ", ideal, " cells (", pct, "%) of type ", set, ".")
    }
  }

  pct <- round((length(keep) / ncol(umis)) * 100, 1)
  message("Kept ", length(keep), " (", pct, "%) of ", ncol(umis),
          " cells in ", length(samplesets), " blocks.")
  umis[, keep]

}

Load the data:


library(SingleCellExperiment) 
library(tidySingleCellExperiment)

if (!exists("tidybarnyard"))  
tidybarnyard <- tidy(readRDS(url("https://ttriche.github.io/RDS/barnyard.rds")))
rowGenome <- rowData(tidybarnyard)$genome

Let's set up some thresholds for classifying cells as mouse or human. First, we need to determine what fraction of transcripts from each genome each cell is expressing. This is a job for a little function.


genomes <- c(human="hg19", mouse="mm10") 
fracExpressed <- function(g, object) {
  subs <- subset(object, rowData(object)$genome == g)
  colSums(counts(subs) > 0) / nrow(subs)
}

fracByCell <- do.call(cbind, 
                      lapply(genomes, fracExpressed, object=tidybarnyard))
smoothScatter(fracByCell)

plot of chunk bygenome

Question: Can you do this again but for each method?

Question: Can you do this again but for each mixture?

VanAndelInstitute/ExpDesign2021 documentation built on Dec. 18, 2021, 6:15 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

VanAndelInstitute/ExpDesign2021
A helper to make sure requirements are installed for ExpDesign2021

vignettes/project2_chunks/project2_tim.md
In VanAndelInstitute/ExpDesign2021: A helper to make sure requirements are installed for ExpDesign2021

Tim's stab at classification and plotting

Sort-of-bonus: resampling

R Package Documentation

Browse R Packages

We want your feedback!

VanAndelInstitute/ExpDesign2021 A helper to make sure requirements are installed for ExpDesign2021

vignettes/project2_chunks/project2_tim.md In VanAndelInstitute/ExpDesign2021: A helper to make sure requirements are installed for ExpDesign2021

Tim's stab at classification and plotting

Sort-of-bonus: resampling

R Package Documentation

Browse R Packages

We want your feedback!

VanAndelInstitute/ExpDesign2021
A helper to make sure requirements are installed for ExpDesign2021

vignettes/project2_chunks/project2_tim.md
In VanAndelInstitute/ExpDesign2021: A helper to make sure requirements are installed for ExpDesign2021