Started on r format(Sys.time(), "%Y-%m-%d %H:%M:%S")

knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, knitr.table.format = "html")

#to test
# library(HDF5Array)
# library(dyno)
# library(ezRun, lib.loc = "~/myRpackages")
# require(tidyverse)
# setwd("/scratch/SCTrajectoryInference_2022-02-18--10-07-08_Pax7Cre_temp7637/Pax7Cre_SCTrajectoryInference")
# model = readRDS("model.rds")
# dyno_dataset = readRDS("dyno_dataset.rds")
# #Seurat objects only
# object = readRDS("/srv/gstore/projects/p23755/o26637_SCOneSampleSeurat_2022-02-10--12-42-40/Pax7Cre_SCReport/scData.rds")
# param = readRDS("/srv/gstore/projects/p23755/o26637_SCOneSampleSeurat_2022-02-10--12-42-40/Pax7Cre_SCReport/param.rds")

library(tidyverse)
expression = dyno_dataset$expression
clusters=object$ident

Trajectory analysis {.tabset}


Cells differentiation was modeled using trajectory inference (TI) methods (also known as pseudotemporal ordering methods), which use single-cell profiles from a population in which the cells are at different unknown points in the dynamic process. These methods computationally order the cells along a trajectory topology, which can be linear, bifurcating, or a more complex tree or graph structure. Because TI methods offer an unbiased and transcriptome-wide understanding of a dynamic process,they allow the objective identification of new subsets of cells, delineation of a differentiation tree, and inference of regulatory interaction responsible for one or more bifurcations.

A key characteristic of TI methods is the selection of prior information that a method requires or can optionally exploit. Prior information can be supplied as a starting cell from which the trajectory will originate, a set of important marker genes, or even a grouping of cells into cell states. Providing prior information to a TI method can be both a blessing and a curse. In one way, prior information can help the method to find the correct trajectory among many, equally likely, alternatives. On the other hand,incorrect or noisy prior information can bias the trajectory towards current knowledge.

Method selection


As the performance of a method most heavily depends on the topology of the trajectory,the choice of TI method will be primarily influenced by the user’s existing knowledge about the expected topology in the data. We therefore show a set of practical guidelines, which combines the method’s performance, user friendliness and the number of assumptions a user is willing to make about the topology of the trajectory. The guidelines are based on an evaluation study ^[ A comparison of single-cell trajectory inference methods Helena Todorov, Yvan Saeys.Nat Biotech (Apr. 2019) doi:10.1038/s41587-019-0071-9] that compared 45 methods on four aspects:

This interactive shiny app can help you to explore the results and select an optimal set of methods for your analysis [https://zouter.shinyapps.io/server/]

When choosing a method, it is important to take two further points into account. First, it is important that a trajectory and the downstream results and/or hypotheses originating from it are confirmed by multiple TI methods. This to make sure the model is not biased towards the particular model underlying a TI method. Second,even if the expected topology is known, it can be beneficial to also try out methods which make the less assumptions about the trajectory topology. When the expected topology is confirmed using such a method, it provides extra evidence to the user’s knowledge of the data. When a more complex topology is produced, this could indicate the presence of a more complex trajectory in the data than was expected by the user.



Practical guidelines

guidelines

root_genes = strsplit(param$root_expression, ",")[[1]]
root_genes = root_genes[root_genes %in% colnames(expression)]
for(i in 1:nrow(model)) {
  tryCatch({
  model[i,]$model[[1]] = model[i,]$model[[1]] %>% add_root_using_expression(root_genes, expression)
 },  error = function(e){message("error adding root genes")})
}

Dimensionality reduction

cat("The performance of a method depends on many factors, mainly the dimensions of the data and the kind of trajectory present in the data. Based on this, we selected the 2 most optimal set of methods for the analysis.")

The most common way to visualize a trajectory is to plot it on a dimensionality reduction of the cells. The following plots color the cells according to

for(i in 1:nrow(model)) {
  cat("\n")
  cat(paste0("#### ", model[i,]$method_name))
  cat("\n")
  cat("\n Dimension reduction method: MDS \n")
  current_model = model[i,]$model[[1]]
  plots <- patchwork::wrap_plots(
  plot_dimred(current_model, label_milestones = TRUE) + ggtitle("Cell ordering"),
  plot_dimred(current_model, grouping = clusters, color_density = "grouping", label_milestones = TRUE) + ggtitle("Cell grouping"),
  plot_dimred(current_model, "pseudotime", pseudotime = calculate_pseudotime(current_model), label_milestones = TRUE) + ggtitle("Pseudotime")
)
  print(plots)
  cat("\n")
  cat("Dimension reduction method: UMAP \n")
  current_model <- current_model  %>% add_dimred(dyndimred::dimred_umap, expression_source = expression)
  plots <- patchwork::wrap_plots(
  plot_dimred(current_model, label_milestones = TRUE) + ggtitle("Cell ordering"),
  plot_dimred(current_model, grouping = clusters, color_density = "grouping", label_milestones = TRUE) + ggtitle("Cell grouping"),
  plot_dimred(current_model, "pseudotime", pseudotime = calculate_pseudotime(current_model), label_milestones = TRUE) + ggtitle("Pseudotime")
)
  print(plots)
  cat("\n")
}
genes = strsplit(param$show_genes, ",")[[1]]
genes = genes[genes %in% colnames(expression)]
cat("### Expression of selected genes")
cat("\n")
for(i in 1:nrow(model)){
  cat("\n")
  cat(paste0("#### ", model[i,]$method_name))
  cat("\n")
  cat("\n Dimension reduction method: MDS \n")
  current_model = model[i,]$model[[1]]
  tryCatch({
 for(gene in genes) {
    plot = plot_dimred(current_model, expression_source = expression, feature_oi = gene, label_milestones = FALSE) +
       theme(legend.position = "none") +
       ggtitle(gene)
    print(plot)
  }
  cat("\n")}, 
  error=function(e){
     message("The genes were not found in the expression matrix.")})
  cat("\n Dimension reduction method: UMAP \n")
  current_model <- current_model  %>% add_dimred(dyndimred::dimred_umap, expression_source = expression)

  tryCatch({
 for(gene in genes) {
    plot = plot_dimred(current_model, expression_source = expression, feature_oi = gene, label_milestones = FALSE) +
       theme(legend.position = "none") +
       ggtitle(gene)
    print(plot)
  }
  cat("\n")}, 
  error=function(e){
     message("The genes were not found in the expression matrix.")})
}

Differential expression

Different kinds of differential expression on the trajectories were calculated:

To find these different kinds of differential expression in a trajectory, we used an algorithm that first defines a particular variable that needs to be predicted (for example, whether a cell is present in a branch or not), and tries to predict this variable based on the expression in different cells. It then ranks each feature based on their predictive capability, and based on this ranking you can select differentially expressed genes. Depending on what variable is predicted, you get a different ranking. This simply depends on what kind of features you are interested in.

Genes that change anywhere in the trajectory


plot_list=list()
for(i in 1:nrow(model)) {
  current_model = model[i,]$model[[1]]
  overall_feature_importances <- dynfeature::calculate_overall_feature_importance(current_model, expression_source = expression)
  features <- overall_feature_importances %>%
  top_n(40, importance) %>%
  pull(feature_id)
  plot = plot_heatmap(
    current_model,
    expression_source = expression,
    features_oi = features) + ggtitle(paste0(model[i,]$method_name, ": Overall important features"))
   plot_list[[i]] = plot
}
 patchwork::wrap_plots(plot_list)

write_tsv(overall_feature_importances, "overall_feature_importances.tsv")



Genes that change in a specific branch (only calculated if the Branch name was supplied)


method_name = strsplit(param$diff_Branch, ",")[[1]][1]
branch = strsplit(param$diff_Branch, ",")[[1]][2]
current_model = model[which(model$method_name == method_name),]$model[[1]]

branch_feature_importance <- calculate_branch_feature_importance(current_model, expression_source = expression)
features <- branch_feature_importance %>% filter(to == branch) %>% top_n(20, importance) %>% pull(feature_id)

tryCatch({
plot= plot_heatmap(
    current_model,
    expression_source = expression,
    features_oi = features) + ggtitle(paste0("Important features to branch ", branch))
plot

write_tsv(branch_feature_importance, "branch_feature_importance.tsv")}
, error=function(e){
  message("No genes were obtained for this branch")})



Genes that change at a branching point (only calculated if the Branch name was supplied)


method_name = strsplit(param$diff_Branch_Point, ",")[[1]][1]
branch_point = strsplit(param$diff_Branch_Point, ",")[[1]][2]
current_model = model[which(model$method_name == method_name),]$model[[1]]

branch_point_feature_importance <- calculate_branching_point_feature_importance(current_model, milestones_oi = branch_point, expression_source = expression)

if(nrow(branch_point_feature_importance) > 0) {
  features <- branch_point_feature_importance %>% top_n(20, importance) %>% pull(feature_id)
  plot=plot_heatmap(
    current_model,
    expression_source = expression,
    features_oi = features) + ggtitle(paste0("Important features at branching point ", branch_point))
  plot
  write_tsv(branch_point_feature_importance, "branch_point_feature_importance.stv")
} else {
  cat(paste0("The heatmap can't be plot because there were not significant genes for the branching point ", branch_point))
}

Data availability

Genes that change anywhere in the trajectory

overall_feature_importances

Genes that change in a specific branch

branch_feature_importance

Genes that change at a branching point

branch_point_feature_importance

Parameters

param[c("start_id", "end_id", "start_n","end_n", "TI_method", "root_expression", "show_genes", "diff_Branch", "diff_Branch_Point")]

SessionInfo

ezSessionInfo()


uzh/ezRun documentation built on May 4, 2024, 3:23 p.m.