Overview

The dataset we will be working with concerns a single-cell RNA-sequencing dataset consisting of three cancer models under a KRAS(G12C) inhibition [@Xue2020]. Those types of molecules are currently in early-phase clinical trials and a large group of lung adenocarcinoma seem to be able to at least partially evade those treatments. The authors of the original study therefore look at the impact of KRAS(G12C) inhibitors on three models of tumors and show that cells evolve along three lineages.

While the authors used the three models to validate their results, we can also look for differences between the three types. We therefore have a three-lineages trajectory, with three conditions (i.e. the three tumor models).

We will use this dataset as an example of how to perform the first two steps of the condiments workflow under this setting. + We first check if we can fit a single trajectory, which we call differential topology. + We then look for large-scale changes, indicative of differential progression and differential differentiation.

Load data

libs <- c("here", "dplyr", "tradeSeq", "SingleCellExperiment", "slingshot",
           "condiments", "scater", "RColorBrewer", "pheatmap", "cowplot",
          "tidyr", "ggplot2")
suppressMessages(
  suppressWarnings(sapply(libs, require, character.only = TRUE))
)
rm(libs)
theme_set(theme_classic())
kras <- bioc2021trajectories::import_KRAS()
kras <- kras[1:100, ]

We rely on normalization conducted in the original paper, and will use the reduced dimension coordinates that are available. We provide a function to access both the raw counts, the reduced dimension coordinates and other cell-level info. Since we will not be performing differential expression, we do not need to retain the genes and we only keep a lightweight version of the object.

data("kras", package = "bioc2021trajectories")
kras

EDA

We can visualize all the single cells in a shared reduced dimensional space, according to the tumor model or the clusters from the original publication.

cols <- c(brewer.pal(5, "Blues")[2:5],
          brewer.pal(3, "Greens")[2:3],
          brewer.pal(3, "Reds")[2:3],
          brewer.pal(3, "Oranges")[2], "Grey")
names(cols) <- c(3, 5, 4, 1, 8, 2, 9, 10, 6, 7)
df <- colData(kras)[, -97] %>% as.data.frame() %>%
  sample_frac(1)
ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7) +
  scale_color_brewer(palette = "Accent") +
  labs(col = "Type")
ggplot(df, aes(x = tSNE1, y = tSNE2, fill = Cluster)) +
  geom_point(size = 1, alpha = .65, col = "grey70", shape = 21) +
  scale_fill_manual(values = cols) +
  labs(fill = "Cell Clusters")

Differential Topology

The first question we ask is whether we can fit a common trajectory. If the differences between the models are limited, then a common trajectory is possible. If there are large changes, then it is better to fit one trajectory per tumor model. In the latter case, depending on the scale of those changes, it may still be possible to reconcile the trajectories manually.

To assess this, we first use a visual tool called the imbalance_score.

Imbalance score

kras <- imbalance_score(Object = kras, conditions = "Batch", dimred = "TSNE")
df$scores <- kras[, df$cells]$scores$scaled_scores
ggplot(df, aes(x = tSNE1, y = tSNE2, col = scores)) +
  geom_point(size = .7) +
  scale_color_viridis_c(option = "C") +
  labs(col = "Score")

There are some clear regions of imbalance (especially in cluster 10), although overall the three conditions are well mixed.

To assess whether a common trajectory can be fitted in a more quantitative manner, we will rely on the topologyTest. To do this, we fit a common trajectory (under the null) which will serve as a guide for condition-level trajectories. To estimate the trajectory, we use slingshot [@Street2018a].

Fit slingshot and visualize the trajectory skeleton.

kras <- slingshot(kras, reducedDim = "TSNE", clusterLabels = kras$Cluster,
                 start.clus = 7, extend = "n", reweight = FALSE, reassign = FALSE)
mst <- slingMST(kras, as.df = TRUE)
ggplot(df, aes(x = tSNE1, y = tSNE2)) +
  geom_point(size = 1, alpha = .65, col = "grey70", shape = 21,
             aes(fill = Cluster)) +
  scale_fill_manual(values = cols) +
  labs(fill = "Cell Clusters") +
  geom_point(size = 3, data = mst) +
  geom_path(size = 1.5, data = mst, aes(group = Lineage))

The skeleton of the trajectory is this tree structure, that tracks changes, consistently with the original publication. The topologyTest utilizes this tree structure when fitting curves to the cells of each condition, as well as to random sets of the data, and compares whether the two distributions are similar.

Topology Test

set.seed(23)
topologyTest(kras, conditions = "Batch", rep = 50)

Here, we clearly reject the topologyTest: we should fit a separate trajectory per condition.

Individual curves

Fit

We therefore fit one trajectory per tumor model, or batch, using the slingshot_conditions function.

sdss <- slingshot_conditions(kras, kras$Batch, approx_points = FALSE,
                             extend = "n", reweight = FALSE, reassign = FALSE)

Plot skeletons

We can plot the skeleton of each trajectory. They are clearly very similar and we can visually map the trajectories: the lineages map well onto one another.

msts <- lapply(sdss, slingMST, as.df = TRUE) %>%
  bind_rows(.id = "Batch") %>%
  arrange(Batch)

ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7, alpha = .1) +
  scale_color_brewer(palette = "Accent") +
  geom_point(data = msts, size = 3) +
  geom_path(data = msts, aes(group = interaction(Lineage, Batch)),
            size = 2)

Plot curves

Similarly with the curves, we see that the three lineages of each trajectory map to one another across conditions.

lineages <- lapply(sdss, slingCurves, as.df = TRUE) %>%
  bind_rows(.id = "Batch") %>%
  arrange(Order)
position <- data.frame(
  "tSNE1" = c(40, -30, 45),
  "tSNE2" = c(50, -50, -50),
  "Batch" = "H2122A",
  "text" = paste0("Lineage ", 1:3)
)

ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7, alpha = .2) +
  scale_color_brewer(palette = "Accent") + 
  geom_path(data = lineages, size = 1.5, aes(group = interaction(Lineage, Batch))) + 
  geom_text(data = position, col = "black", aes(label = text), size = 5)

Manual mapping

Although we have fitted a different trajectory per condition, we can therefore proceed to step 2 and look at differential progression and differentiation, by manually mapping the trajectories.

mapping <- matrix(rep(1:3, each = 3), nrow = 3, ncol = 3, byrow = TRUE)
mapping
sds <- merge_sds(sdss[[1]], sdss[[2]], sdss[[3]], 
                 condition_id = names(sdss), mapping = mapping)

Differential Progression

Similarly to the TGFB dataset, we can look at large-scale differences between conditions. Specifically, we want to see if the pseudotime distributions are similar across conditions.

Plot

df <- full_join(
  df %>% 
    select(cells, tSNE1, tSNE2, Cluster, Batch),
  slingPseudotime(sds) %>%
    as.data.frame() %>%
    mutate(cells = rownames(.))
) %>%
  pivot_longer(starts_with("Lineage"), names_to = "Curve", values_to = "pst")

ggplot(df, aes(x = pst)) +
  geom_density(alpha = .4, aes(fill = Batch), col = "transparent") +
  geom_density(aes(col = Batch), fill = "transparent", size = 1.5) +
  guides(col = FALSE) +
  scale_fill_brewer(palette = "Accent") +
  scale_color_brewer(palette = "Accent") +
  labs(x = "Pseudotime", fill = "Type") +
  facet_wrap(~ Curve, scales = "free_x")

Visually, we can see three phenomenons:

Test

We can also test this in a more principled manner using the progressionTest. Since we have three conditions, we perform a classifier test (see [@Lopez-Paz2016]). By setting lineages=TRUE, we perform the test for individual lineages and for the full trajectory.

The test statistic reflects the accuracy of the classifier on a balanced test data, so under the null, the expected value is $1/3$.

progressionTest(sds, conditions = kras$Batch, lineages = TRUE)

Differential Differentiation

Each cell in a trajectory is characterized with two components: its pseudotime values in each lineage, and its weights with regard to each lineage. Differential progression looks at the distribution of pseudotime between conditions while differential differentiation looks at the distribution of weights values between conditions.

It is slightly harder to interpret but we can start with a few plots.

Plot

df <- bioc2021trajectories::sling_reassign(sds) %>% 
  as.data.frame() %>%
  mutate(cells = rownames(.)) %>%
  dplyr::rename("Lineage1" = V1, "Lineage2" = V2, "Lineage3" = V3) %>%
  pivot_longer(starts_with("Lineage"), names_to = "Curve", values_to = "weights") %>%
  full_join(df) %>%
  group_by(cells) %>%
  select(-pst) %>% 
  mutate(weights = weights / sum(weights)) %>% 
  ungroup()

We can first look at the total weight of each lineage in each condition. Two clear conclusions can be drawn here: + Lineage 1, the longest lineage, indeed captures more cells in all lineages + There are fewer H2122A cells in lineage 2, compared to the other conditions.

ggplot(df %>% group_by(Batch, Curve) %>% 
         summarise(weights = mean(weights), .groups = NULL),
       aes(x = Curve, fill = Batch, y = weights)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Accent") +
  theme(legend.position = c(.7, .7)) +
  labs(x = "", y = "Mean weight")
ggplot(df %>% pivot_wider(names_from = "Curve", values_from = "weights"),
       aes(x = Lineage1, y = Lineage3)) +
  geom_hex() +
  scale_fill_viridis_c(direction = -1) +
  facet_wrap(~Batch, scales = "free") +
  geom_abline(slope = -1, intercept = 1, linetype = "dotted") +
  geom_abline(slope = -1, intercept = 2/3, linetype = "dotted") +
  geom_abline(slope = -1, intercept = 1/3, linetype = "dotted") +
  annotate("text", x = .53, y = .53, label = "w3 = 0", angle = -52) +
  annotate("text", x = .62, y = .1, label = "w3 = 1/3", angle = -52) +
  annotate("text", x = .14, y = .14, label = "w3 = 2/3", angle = -52) +
  theme(legend.position = "bottom") +
  labs(x = "Weights for Lineage 1 (w1)", y = "Weights for Lineage 2 (w2)",
       fill = "counts per hexagon")

Test

differentiationTest(sds, conditions = kras$Batch, pairwise = TRUE)

Session info

sessionInfo()

References



HectorRDB/bioc2021trajectories documentation built on Dec. 17, 2021, 10:32 p.m.