Started on r format(Sys.time(), "%Y-%m-%d %H:%M:%S")
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, knitr.table.format = "html") #to test # library(HDF5Array) # library(dyno) # library(ezRun, lib.loc = "~/myRpackages") # require(tidyverse) # setwd("/scratch/SCTrajectoryInference_2022-02-18--10-07-08_Pax7Cre_temp7637/Pax7Cre_SCTrajectoryInference") # model = readRDS("model.rds") # dyno_dataset = readRDS("dyno_dataset.rds") # #Seurat objects only # object = readRDS("/srv/gstore/projects/p23755/o26637_SCOneSampleSeurat_2022-02-10--12-42-40/Pax7Cre_SCReport/scData.rds") # param = readRDS("/srv/gstore/projects/p23755/o26637_SCOneSampleSeurat_2022-02-10--12-42-40/Pax7Cre_SCReport/param.rds") library(tidyverse) expression = dyno_dataset$expression clusters=object$ident
Cells differentiation was modeled using trajectory inference (TI) methods (also known as pseudotemporal ordering methods), which use single-cell profiles from a population in which the cells are at different unknown points in the dynamic process. These methods computationally order the cells along a trajectory topology, which can be linear, bifurcating, or a more complex tree or graph structure. Because TI methods offer an unbiased and transcriptome-wide understanding of a dynamic process,they allow the objective identification of new subsets of cells, delineation of a differentiation tree, and inference of regulatory interaction responsible for one or more bifurcations.
A key characteristic of TI methods is the selection of prior information that a method requires or can optionally exploit. Prior information can be supplied as a starting cell from which the trajectory will originate, a set of important marker genes, or even a grouping of cells into cell states. Providing prior information to a TI method can be both a blessing and a curse. In one way, prior information can help the method to find the correct trajectory among many, equally likely, alternatives. On the other hand,incorrect or noisy prior information can bias the trajectory towards current knowledge.
As the performance of a method most heavily depends on the topology of the trajectory,the choice of TI method will be primarily influenced by the user’s existing knowledge about the expected topology in the data. We therefore show a set of practical guidelines, which combines the method’s performance, user friendliness and the number of assumptions a user is willing to make about the topology of the trajectory. The guidelines are based on an evaluation study ^[ A comparison of single-cell trajectory inference methods Helena Todorov, Yvan Saeys.Nat Biotech (Apr. 2019) doi:10.1038/s41587-019-0071-9] that compared 45 methods on four aspects:
This interactive shiny app can help you to explore the results and select an optimal set of methods for your analysis [https://zouter.shinyapps.io/server/]
When choosing a method, it is important to take two further points into account. First, it is important that a trajectory and the downstream results and/or hypotheses originating from it are confirmed by multiple TI methods. This to make sure the model is not biased towards the particular model underlying a TI method. Second,even if the expected topology is known, it can be beneficial to also try out methods which make the less assumptions about the trajectory topology. When the expected topology is confirmed using such a method, it provides extra evidence to the user’s knowledge of the data. When a more complex topology is produced, this could indicate the presence of a more complex trajectory in the data than was expected by the user.
root_genes = strsplit(param$root_expression, ",")[[1]] root_genes = root_genes[root_genes %in% colnames(expression)] for(i in 1:nrow(model)) { tryCatch({ model[i,]$model[[1]] = model[i,]$model[[1]] %>% add_root_using_expression(root_genes, expression) }, error = function(e){message("error adding root genes")}) }
cat("The performance of a method depends on many factors, mainly the dimensions of the data and the kind of trajectory present in the data. Based on this, we selected the 2 most optimal set of methods for the analysis.")
The most common way to visualize a trajectory is to plot it on a dimensionality reduction of the cells. The following plots color the cells according to
for(i in 1:nrow(model)) { cat("\n") cat(paste0("#### ", model[i,]$method_name)) cat("\n") cat("\n Dimension reduction method: MDS \n") current_model = model[i,]$model[[1]] plots <- patchwork::wrap_plots( plot_dimred(current_model, label_milestones = TRUE) + ggtitle("Cell ordering"), plot_dimred(current_model, grouping = clusters, color_density = "grouping", label_milestones = TRUE) + ggtitle("Cell grouping"), plot_dimred(current_model, "pseudotime", pseudotime = calculate_pseudotime(current_model), label_milestones = TRUE) + ggtitle("Pseudotime") ) print(plots) cat("\n") cat("Dimension reduction method: UMAP \n") current_model <- current_model %>% add_dimred(dyndimred::dimred_umap, expression_source = expression) plots <- patchwork::wrap_plots( plot_dimred(current_model, label_milestones = TRUE) + ggtitle("Cell ordering"), plot_dimred(current_model, grouping = clusters, color_density = "grouping", label_milestones = TRUE) + ggtitle("Cell grouping"), plot_dimred(current_model, "pseudotime", pseudotime = calculate_pseudotime(current_model), label_milestones = TRUE) + ggtitle("Pseudotime") ) print(plots) cat("\n") }
genes = strsplit(param$show_genes, ",")[[1]] genes = genes[genes %in% colnames(expression)] cat("### Expression of selected genes") cat("\n") for(i in 1:nrow(model)){ cat("\n") cat(paste0("#### ", model[i,]$method_name)) cat("\n") cat("\n Dimension reduction method: MDS \n") current_model = model[i,]$model[[1]] tryCatch({ for(gene in genes) { plot = plot_dimred(current_model, expression_source = expression, feature_oi = gene, label_milestones = FALSE) + theme(legend.position = "none") + ggtitle(gene) print(plot) } cat("\n")}, error=function(e){ message("The genes were not found in the expression matrix.")}) cat("\n Dimension reduction method: UMAP \n") current_model <- current_model %>% add_dimred(dyndimred::dimred_umap, expression_source = expression) tryCatch({ for(gene in genes) { plot = plot_dimred(current_model, expression_source = expression, feature_oi = gene, label_milestones = FALSE) + theme(legend.position = "none") + ggtitle(gene) print(plot) } cat("\n")}, error=function(e){ message("The genes were not found in the expression matrix.")}) }
Different kinds of differential expression on the trajectories were calculated:
To find these different kinds of differential expression in a trajectory, we used an algorithm that first defines a particular variable that needs to be predicted (for example, whether a cell is present in a branch or not), and tries to predict this variable based on the expression in different cells. It then ranks each feature based on their predictive capability, and based on this ranking you can select differentially expressed genes.
Depending on what variable is predicted, you get a different ranking. This simply depends on what kind of features you are interested in.
plot_list=list() for(i in 1:nrow(model)) { current_model = model[i,]$model[[1]] overall_feature_importances <- dynfeature::calculate_overall_feature_importance(current_model, expression_source = expression) features <- overall_feature_importances %>% top_n(40, importance) %>% pull(feature_id) plot = plot_heatmap( current_model, expression_source = expression, features_oi = features) + ggtitle(paste0(model[i,]$method_name, ": Overall important features")) plot_list[[i]] = plot } patchwork::wrap_plots(plot_list) write_tsv(overall_feature_importances, "overall_feature_importances.tsv")
method_name = strsplit(param$diff_Branch, ",")[[1]][1] branch = strsplit(param$diff_Branch, ",")[[1]][2] current_model = model[which(model$method_name == method_name),]$model[[1]] branch_feature_importance <- calculate_branch_feature_importance(current_model, expression_source = expression) features <- branch_feature_importance %>% filter(to == branch) %>% top_n(20, importance) %>% pull(feature_id) tryCatch({ plot= plot_heatmap( current_model, expression_source = expression, features_oi = features) + ggtitle(paste0("Important features to branch ", branch)) plot write_tsv(branch_feature_importance, "branch_feature_importance.tsv")} , error=function(e){ message("No genes were obtained for this branch")})
method_name = strsplit(param$diff_Branch_Point, ",")[[1]][1] branch_point = strsplit(param$diff_Branch_Point, ",")[[1]][2] current_model = model[which(model$method_name == method_name),]$model[[1]] branch_point_feature_importance <- calculate_branching_point_feature_importance(current_model, milestones_oi = branch_point, expression_source = expression) if(nrow(branch_point_feature_importance) > 0) { features <- branch_point_feature_importance %>% top_n(20, importance) %>% pull(feature_id) plot=plot_heatmap( current_model, expression_source = expression, features_oi = features) + ggtitle(paste0("Important features at branching point ", branch_point)) plot write_tsv(branch_point_feature_importance, "branch_point_feature_importance.stv") } else { cat(paste0("The heatmap can't be plot because there were not significant genes for the branching point ", branch_point)) }
branch_point_feature_importance
param[c("start_id", "end_id", "start_n","end_n", "TI_method", "root_expression", "show_genes", "diff_Branch", "diff_Branch_Point")]
ezSessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.