In HectorRDB/bioc2021trajectories: Trajectory inference across conditions: differential topology, progression and expression

Overview

The dataset we will be working with concerns a single-cell RNA-sequencing dataset consisting of three cancer models under a KRAS(G12C) inhibition [@Xue2020]. Those types of molecules are currently in early-phase clinical trials and a large group of lung adenocarcinoma seem to be able to at least partially evade those treatments. The authors of the original study therefore look at the impact of KRAS(G12C) inhibitors on three models of tumors and show that cells evolve along three lineages.

While the authors used the three models to validate their results, we can also look for differences between the three types. We therefore have a three-lineages trajectory, with three conditions (i.e. the three tumor models).

We will use this dataset as an example of how to perform the first two steps of the condiments workflow under this setting. + We first check if we can fit a single trajectory, which we call differential topology. + We then look for large-scale changes, indicative of differential progression and differential differentiation.

Load data

libs <- c("here", "dplyr", "tradeSeq", "SingleCellExperiment", "slingshot",
           "condiments", "scater", "RColorBrewer", "pheatmap", "cowplot",
          "tidyr", "ggplot2")
suppressMessages(
  suppressWarnings(sapply(libs, require, character.only = TRUE))
)
rm(libs)
theme_set(theme_classic())

kras <- bioc2021trajectories::import_KRAS()
kras <- kras[1:100, ]

We rely on normalization conducted in the original paper, and will use the reduced dimension coordinates that are available. We provide a function to access both the raw counts, the reduced dimension coordinates and other cell-level info. Since we will not be performing differential expression, we do not need to retain the genes and we only keep a lightweight version of the object.

data("kras", package = "bioc2021trajectories")

kras

EDA

We can visualize all the single cells in a shared reduced dimensional space, according to the tumor model or the clusters from the original publication.

cols <- c(brewer.pal(5, "Blues")[2:5],
          brewer.pal(3, "Greens")[2:3],
          brewer.pal(3, "Reds")[2:3],
          brewer.pal(3, "Oranges")[2], "Grey")
names(cols) <- c(3, 5, 4, 1, 8, 2, 9, 10, 6, 7)
df <- colData(kras)[, -97] %>% as.data.frame() %>%
  sample_frac(1)
ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7) +
  scale_color_brewer(palette = "Accent") +
  labs(col = "Type")

ggplot(df, aes(x = tSNE1, y = tSNE2, fill = Cluster)) +
  geom_point(size = 1, alpha = .65, col = "grey70", shape = 21) +
  scale_fill_manual(values = cols) +
  labs(fill = "Cell Clusters")

Differential Topology

The first question we ask is whether we can fit a common trajectory. If the differences between the models are limited, then a common trajectory is possible. If there are large changes, then it is better to fit one trajectory per tumor model. In the latter case, depending on the scale of those changes, it may still be possible to reconcile the trajectories manually.

To assess this, we first use a visual tool called the imbalance_score.

Imbalance score

kras <- imbalance_score(Object = kras, conditions = "Batch", dimred = "TSNE")
df$scores <- kras[, df$cells]$scores$scaled_scores
ggplot(df, aes(x = tSNE1, y = tSNE2, col = scores)) +
  geom_point(size = .7) +
  scale_color_viridis_c(option = "C") +
  labs(col = "Score")

There are some clear regions of imbalance (especially in cluster 10), although overall the three conditions are well mixed.

To assess whether a common trajectory can be fitted in a more quantitative manner, we will rely on the topologyTest. To do this, we fit a common trajectory (under the null) which will serve as a guide for condition-level trajectories. To estimate the trajectory, we use slingshot [@Street2018a].

Fit slingshot and visualize the trajectory skeleton.

kras <- slingshot(kras, reducedDim = "TSNE", clusterLabels = kras$Cluster,
                 start.clus = 7, extend = "n", reweight = FALSE, reassign = FALSE)

mst <- slingMST(kras, as.df = TRUE)
ggplot(df, aes(x = tSNE1, y = tSNE2)) +
  geom_point(size = 1, alpha = .65, col = "grey70", shape = 21,
             aes(fill = Cluster)) +
  scale_fill_manual(values = cols) +
  labs(fill = "Cell Clusters") +
  geom_point(size = 3, data = mst) +
  geom_path(size = 1.5, data = mst, aes(group = Lineage))

The skeleton of the trajectory is this tree structure, that tracks changes, consistently with the original publication. The topologyTest utilizes this tree structure when fitting curves to the cells of each condition, as well as to random sets of the data, and compares whether the two distributions are similar.

Topology Test

set.seed(23)
topologyTest(kras, conditions = "Batch", rep = 50)

Here, we clearly reject the topologyTest: we should fit a separate trajectory per condition.

Individual curves

Fit

We therefore fit one trajectory per tumor model, or batch, using the slingshot_conditions function.

sdss <- slingshot_conditions(kras, kras$Batch, approx_points = FALSE,
                             extend = "n", reweight = FALSE, reassign = FALSE)

Plot skeletons

We can plot the skeleton of each trajectory. They are clearly very similar and we can visually map the trajectories: the lineages map well onto one another.

msts <- lapply(sdss, slingMST, as.df = TRUE) %>%
  bind_rows(.id = "Batch") %>%
  arrange(Batch)

ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7, alpha = .1) +
  scale_color_brewer(palette = "Accent") +
  geom_point(data = msts, size = 3) +
  geom_path(data = msts, aes(group = interaction(Lineage, Batch)),
            size = 2)

Plot curves

Similarly with the curves, we see that the three lineages of each trajectory map to one another across conditions.

lineages <- lapply(sdss, slingCurves, as.df = TRUE) %>%
  bind_rows(.id = "Batch") %>%
  arrange(Order)

position <- data.frame(
  "tSNE1" = c(40, -30, 45),
  "tSNE2" = c(50, -50, -50),
  "Batch" = "H2122A",
  "text" = paste0("Lineage ", 1:3)
)

ggplot(df, aes(x = tSNE1, y = tSNE2, col = Batch)) +
  geom_point(size = .7, alpha = .2) +
  scale_color_brewer(palette = "Accent") + 
  geom_path(data = lineages, size = 1.5, aes(group = interaction(Lineage, Batch))) + 
  geom_text(data = position, col = "black", aes(label = text), size = 5)

Manual mapping

Although we have fitted a different trajectory per condition, we can therefore proceed to step 2 and look at differential progression and differentiation, by manually mapping the trajectories.

mapping <- matrix(rep(1:3, each = 3), nrow = 3, ncol = 3, byrow = TRUE)
mapping
sds <- merge_sds(sdss[[1]], sdss[[2]], sdss[[3]], 
                 condition_id = names(sdss), mapping = mapping)

Differential Progression

Similarly to the TGFB dataset, we can look at large-scale differences between conditions. Specifically, we want to see if the pseudotime distributions are similar across conditions.

Plot

df <- full_join(
  df %>% 
    select(cells, tSNE1, tSNE2, Cluster, Batch),
  slingPseudotime(sds) %>%
    as.data.frame() %>%
    mutate(cells = rownames(.))
) %>%
  pivot_longer(starts_with("Lineage"), names_to = "Curve", values_to = "pst")

ggplot(df, aes(x = pst)) +
  geom_density(alpha = .4, aes(fill = Batch), col = "transparent") +
  geom_density(aes(col = Batch), fill = "transparent", size = 1.5) +
  guides(col = FALSE) +
  scale_fill_brewer(palette = "Accent") +
  scale_color_brewer(palette = "Accent") +
  labs(x = "Pseudotime", fill = "Type") +
  facet_wrap(~ Curve, scales = "free_x")

Visually, we can see three phenomenons:

In lineage 1, there is a cluster of cells from H2122A around pseudotime 100, and more cells at the end of the lineage for condition H358A.
In lineage 2, there are many more cells from H2122A for earlier pseudotime values and far fewer for later time points.
In lineage 3, there are many more cells from SW1573A for earlier pseudotime values and far fewer for later time points.

Test

We can also test this in a more principled manner using the progressionTest. Since we have three conditions, we perform a classifier test (see [@Lopez-Paz2016]). By setting lineages=TRUE, we perform the test for individual lineages and for the full trajectory.

The test statistic reflects the accuracy of the classifier on a balanced test data, so under the null, the expected value is $1/3$.

progressionTest(sds, conditions = kras$Batch, lineages = TRUE)

Differential Differentiation

Each cell in a trajectory is characterized with two components: its pseudotime values in each lineage, and its weights with regard to each lineage. Differential progression looks at the distribution of pseudotime between conditions while differential differentiation looks at the distribution of weights values between conditions.

It is slightly harder to interpret but we can start with a few plots.

Plot

df <- bioc2021trajectories::sling_reassign(sds) %>% 
  as.data.frame() %>%
  mutate(cells = rownames(.)) %>%
  dplyr::rename("Lineage1" = V1, "Lineage2" = V2, "Lineage3" = V3) %>%
  pivot_longer(starts_with("Lineage"), names_to = "Curve", values_to = "weights") %>%
  full_join(df) %>%
  group_by(cells) %>%
  select(-pst) %>% 
  mutate(weights = weights / sum(weights)) %>% 
  ungroup()

We can first look at the total weight of each lineage in each condition. Two clear conclusions can be drawn here: + Lineage 1, the longest lineage, indeed captures more cells in all lineages + There are fewer H2122A cells in lineage 2, compared to the other conditions.

ggplot(df %>% group_by(Batch, Curve) %>% 
         summarise(weights = mean(weights), .groups = NULL),
       aes(x = Curve, fill = Batch, y = weights)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Accent") +
  theme(legend.position = c(.7, .7)) +
  labs(x = "", y = "Mean weight")

ggplot(df %>% pivot_wider(names_from = "Curve", values_from = "weights"),
       aes(x = Lineage1, y = Lineage3)) +
  geom_hex() +
  scale_fill_viridis_c(direction = -1) +
  facet_wrap(~Batch, scales = "free") +
  geom_abline(slope = -1, intercept = 1, linetype = "dotted") +
  geom_abline(slope = -1, intercept = 2/3, linetype = "dotted") +
  geom_abline(slope = -1, intercept = 1/3, linetype = "dotted") +
  annotate("text", x = .53, y = .53, label = "w3 = 0", angle = -52) +
  annotate("text", x = .62, y = .1, label = "w3 = 1/3", angle = -52) +
  annotate("text", x = .14, y = .14, label = "w3 = 2/3", angle = -52) +
  theme(legend.position = "bottom") +
  labs(x = "Weights for Lineage 1 (w1)", y = "Weights for Lineage 2 (w2)",
       fill = "counts per hexagon")

Test

differentiationTest(sds, conditions = kras$Batch, pairwise = TRUE)

Session info

sessionInfo()

References

HectorRDB/bioc2021trajectories documentation built on Dec. 17, 2021, 10:32 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

HectorRDB/bioc2021trajectories
Trajectory inference across conditions: differential topology, progression and expression

In HectorRDB/bioc2021trajectories: Trajectory inference across conditions: differential topology, progression and expression

Overview

Load data

EDA

Differential Topology

Imbalance score

Fit slingshot and visualize the trajectory skeleton.

Topology Test

Individual curves

Fit

Plot skeletons

Plot curves

Manual mapping

Differential Progression

Plot

Test

Differential Differentiation

Plot

Test

Session info

References

R Package Documentation

Browse R Packages

We want your feedback!

HectorRDB/bioc2021trajectories Trajectory inference across conditions: differential topology, progression and expression

In HectorRDB/bioc2021trajectories: Trajectory inference across conditions: differential topology, progression and expression

Overview

Load data

EDA

Differential Topology

Imbalance score

Fit slingshot and visualize the trajectory skeleton.

Topology Test

Individual curves

Fit

Plot skeletons

Plot curves

Manual mapping

Differential Progression

Plot

Test

Differential Differentiation

Plot

Test

Session info

References

R Package Documentation

Browse R Packages

We want your feedback!

HectorRDB/bioc2021trajectories
Trajectory inference across conditions: differential topology, progression and expression