knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(SingleCellExperiment)
library(ggplot2)
load("../data/tiny_sce.rda")
devtools::load_all()

SingleCellExperiment objects contain:

In principle, all of these values (e.g., individual values from assay, colData, rowData) will be of interest for the plotting. The coordinates from reducedDims will be the basis for the plot (defining X and Y). Every dot will represent a cell at this point (in the future, we would like to do the same types of analyses and plots for the genes, too, of course!) Categorical data can be used to facet the plots, continuous data is, of course, better suited for coloring.

head(colData(tiny_sce))
head(rowData(tiny_sce))
str(reducedDims(tiny_sce))

names(reducedDims(tiny_sce))

Different features of tSNE and PCA results

The most obvious difference is that we usually retain more PCs than let tSNE components compute. This means we cannot just rely on taking the first two columns of whatever is stored in reducedDims.

head(reducedDim(tiny_sce, "PCA"))
head(reducedDim(tiny_sce, "TSNE"))

Also, I'm currently storing the % variation as an attribute in the PCA results. Those are values that should be pasted onto the x/y axes, so they shouldn't get lost.

attr(reducedDim(tiny_sce, "PCA"), "percentVar")

Current routine

There's one wrapper function to generate a clunky object that, in principle, contains everything that's going to be needed for ggplot2-ing.

drp <- get_reducedDimPlot.sce(tiny_sce, which_reddim = "PCA",
                              which_pcs = c(2:3), 
                              color_by = "ENSMUSG00000051579",
                              dim_red_type = "PCA",
                              add_cell_info = names(colData(tiny_sce)))

It's really just a list, with drp$plot_data being the most important part, i.e. the data.frame with the actual values to be plotted.

head(drp$plot_data)
## very basic example; note that aes_string is a must because of the shiny-application further down the road
ggplot(drp$plot_data, aes_string(x = "x_axs",y = "y_axs")) + geom_point(aes_string(color = "log10_total_features"), size = 4)

The wrapper function for the plotting is this one:

plt.DimRedPlot(drp, color_by = "barcode", ignore_drp_labels = "exprs_val_type", circle_by = "condition")

There is some stuff that's clunky, e.g. the ignore_drp_labels -- the rationale was that I wanted to enforce that the type of expression value that is used for coloring should be shown in the legend, e.g. when a gene's expression is used. That doesn't always make sense though. Neither does the "factors" part of that drp object, I think.

plt.DimRedPlot(drp, color_by = "ENSMUSG00000051579", shape_by = "condition")

Some quirks of my code

The main functions to start delving into this are plt.DimRedPlot and get_reducedDimPlot.sce. Ideally, there should be a function that can also work with simple matrices and simple SingleExperiment objects that don't have the PCA/tSNE coordinates stored within them. I originally thought that making one object (DimRedPlot) would then help me to just maintain one big plotting function, which is really what I'd want: multiple ways to deal with different types of input, but just one function for the plotting.

Ideally, the data.frame for ggplot (e.g. drp$plot_data) does not need to be changed, but I'm not sure yet how to efficiently add just one single column, e.g., when the user wants to color using a different gene. I think going via data.table and their efficient merging routine might be worth a try.

Definitely things to look into/change

Additional things to think about

Ideally, this should all be done within the realm of SingleCellExperiment, but currenlty I'm actually supplying the reduced dimension information separately, and given the constraints of SingleCellExperiment, I'm open to keeping it that way.

Session info {.unnumbered}

sessionInfo()


evanbiederstedt/sandbox documentation built on May 26, 2019, 12:31 p.m.