Interaction with the tidyverse and ggplot2

The tidyverse, ggplot2, and destiny are a great fit!

suppressPackageStartupMessages({
    library(destiny)
    library(tidyverse)
    library(forcats)  # not in the default tidyverse loadout
})

ggplot has a peculiar method to set default scales: You just have to define certain variables.

scale_colour_continuous <- scale_color_viridis_c

When working mainly with dimension reductions, I suggest to hide the (useless) ticks:

theme_set(theme_gray() + theme(
    axis.ticks = element_blank(),
    axis.text  = element_blank()))

Let’s load our dataset

data(guo_norm)

Of course you could use tidyr::gather() to tidy or transform the data now, but the data is already in the right form for destiny, and R for Data Science is a better resource for it than this vignette. The long form of a single cell ExpressionSet would look like:

guo_norm %>%
    as('data.frame') %>%
    gather(Gene, Expression, one_of(featureNames(guo_norm)))

But destiny doesn’t use long form data as input, since all single cell data has always a more compact structure of genes×cells, with a certain number of per-sample covariates (The structure of ExpressionSet).

dm <- DiffusionMap(guo_norm)

names(dm) shows what names can be used in dm$<name>, as.data.frame(dm)$<name>, or ggplot(dm, aes(<name>)):

names(dm)  # namely: Diffusion Components, Genes, and Covariates

Due to the fortify method (which here just means as.data.frame) being defined on DiffusionMap objects, ggplot directly accepts DiffusionMap objects:

ggplot(dm, aes(DC1, DC2, colour = Klf2)) +
    geom_point()

When you want to use a Diffusion Map in a dplyr pipeline, you need to call fortify/as.data.frame directly:

fortify(dm) %>%
    mutate(
        EmbryoState = factor(num_cells) %>%
            lvls_revalue(paste(levels(.), 'cell state'))
    ) %>% ggplot(aes(DC1, DC2, colour = EmbryoState)) +
        geom_point()

The Diffusion Components of a converted Diffusion Map, similar to the genes in the input ExpressionSet, are individual variables instead of two columns in a long-form data frame, but sometimes it can be useful to “tidy” them:

fortify(dm) %>%
    gather(DC, OtherDC, num_range('DC', 2:5)) %>%
    ggplot(aes(DC1, OtherDC, colour = factor(num_cells))) +
        geom_point() +
        facet_wrap(~ DC)

Another tip: To reduce overplotting, use sample_frac(., 1.0, replace = FALSE) (the default) in a pipeline.

Adding a constant alpha improves this even more, and also helps you see density:

fortify(dm) %>%
    sample_frac() %>%
    ggplot(aes(DC1, DC2, colour = factor(num_cells))) +
        geom_point(alpha = .3)


theislab/destiny documentation built on Jan. 27, 2024, 9:57 p.m.