10X Visium captures spatially resolved transcriptomic profiles in spots containing multiple cells. In this vignette, we will use the gene expression information from Visium data to infer pathway and transcription factor activity and separately investigate spatial relationships between them and the cell-type composition. In addition, we will examine spatial relationships of ligands and receptors.
Load the necessary R packages:
# MISTy library(mistyR) # For using Python library(reticulate) # Seurat library(Seurat) # Data manipulation library(tidyverse) # Pathways library(decoupleR) #Cleaning names library(janitor)
We will use some functions in python since the computation time is significantly shorter than in R. Python chunks start with a #In Python. Install and load the necessary package for Python:
py_install(c("decoupler","omnipath"), pip =TRUE)
# In Python: import decoupler as dc
For this showcase, we use a 10X Visium spatial slide from Kuppe et al., 2022, where they created a spatial multi-omic map of human myocardial infarction. The tissue example data comes from the human heart of patient 14, which is in a chronic state following myocardial infarction. The Seurat object contains, among other things, the normalized and raw gene counts. First, we have to download and extract the file:
download.file("https://zenodo.org/records/6580069/files/10X_Visium_ACH005.tar.gz?download=1", destfile = "10X_Visium_ACH005.tar.gz", method = "curl") untar("10X_Visium_ACH005.tar.gz")
The next step is to load the data, extract the normalized gene counts of genes expressed in at least 5% of the spots, and pixel coordinates. It is recommended to use pixel coordinates instead of row and column numbers since the rows are shifted and therefore do not express the real distance between the spots.
seurat <- readRDS("ACH005/ACH005.rds") expression_raw <- as.matrix(GetAssayData(seurat, layer = "counts", assay = "SCT")) geometry <- GetTissueCoordinates(seurat, scale = NULL) # Only take genes that expressed in at least 5% of the spots expression <- expression_raw[rownames(expression_raw[(rowSums(expression_raw > 0) / ncol(expression_raw)) >= 0.05,]),]
Let's take a look at the slide itself and some of the cell-type niches defined by Kuppe et al.:
SpatialPlot(seurat, alpha = 0) SpatialPlot(seurat, group.by = "celltype_niche")
The Seurat Object of the tissue slide also contains the estimated cell type proportions from cell2location. We extract them into a separate object we will later use with MISTy and visualize some of the cell types:
# Rename to more informative names rownames(seurat@assays$c2l_props@data) <- rownames(seurat@assays$c2l_props@data) %>% recode('Adipo' = 'Adipocytes', 'CM' = 'Cardiomyocytes', 'Endo' = 'Endothelial', 'Fib' = 'Fibroblasts', 'PC' = 'Pericytes', 'prolif' = 'Proliferating', 'vSMCs' = 'Vascular-SMCs') # Extract into a separate object composition <- as_tibble(t(seurat[["c2l_props"]]$data)) # Visualize cell types DefaultAssay(seurat) <- "c2l_props" SpatialFeaturePlot(seurat, keep.scale = NULL, features = c('Vascular-SMCs', "Cardiomyocytes", "Endothelial", "Fibroblasts"), ncol = 2)
Let's investigate the relationship between the cell-type compositions and pathway activities in our example slide. But before we create the views, we need to estimate the pathway activities. For this we will take pathway gene sets from PROGENy
and estimate the activity with decoupleR
:
# Obtain genesets model <- get_progeny(organism = "human", top = 500) # Use multivariate linear model to estimate activity est_path_act <- run_mlm(expression, model,.mor = NULL)
We add the result to the Seurat Object and plot the estimated activities to see the distribution over the slide:
# Delete progeny assay from Kuppe et al. seurat[['progeny']] <- NULL # Put estimated pathway activities object into the correct format est_path_act_wide <- est_path_act %>% pivot_wider(id_cols = condition, names_from = source, values_from = score) %>% column_to_rownames("condition") # Clean names colnames(est_path_act_wide) <- est_path_act_wide %>% clean_names(parsing_option = 0) %>% colnames(.) # Add seurat[['progeny']] <- CreateAssayObject(counts = t(est_path_act_wide)) SpatialFeaturePlot(seurat, features = c("jak.stat", "hypoxia"), image.alpha = 0)
For the MISTy view, we will use cell type compositions per spot as the intraview and add the estimated PROGENy
pathway activities as juxta and paraviews. The size of the neighborhood and the kernel, as well as the kernel family, should be chosen depending on the experiment. Here both distances were chosen to enclose only a small number of neighboring spots.
# Clean names colnames(composition) <- composition %>% clean_names(parsing_option = 0) %>% colnames(.) # create intra from cell-type composition comp_views <- create_initial_view(composition) # juxta & para from pathway activity path_act_views <- create_initial_view(est_path_act_wide) %>% add_juxtaview(geometry, neighbor.thr = 130) %>% add_paraview(geometry, l= 200, family = "gaussian") # Combine views com_path_act_views <- comp_views %>% add_views(create_view("juxtaview.path.130", path_act_views[["juxtaview.130"]]$data, "juxta.path.130"))%>% add_views(create_view("paraview.path.200", path_act_views[["paraview.200"]]$data, "para.path.200"))
Then run MISTy and collect the results:
run_misty(com_path_act_views, "result/comp_path_act") misty_results_com_path_act <- collect_results("result/comp_path_act/")
With the collected results, we can now answer the following questions:
Here we can look at two different statistics: intra.R2
shows the variance explained by the intraview alone, and gain.R2
shows the increase in explainable variance when we additionally consider the other views (here juxta and para).
misty_results_com_path_act %>% plot_improvement_stats("intra.R2")%>% plot_improvement_stats("gain.R2")
The juxta and paraview particularly increase the explained variance for mast cells and adipocytes.
In general, the significant gain in R2 can be interpreted as the following:
"We can better explain the expression of marker X when we consider additional views other than the intrinsic view."
To see the individual contributions of the views we can use:
misty_results_com_path_act %>% plot_view_contributions()
We see, that the intraview explains the most variance for nearly all cell types (as expected).
We can individually show the importance of the markers from each viewpoint as predictors of the spot intrinsic cell-type composition to explain the contributions.
Let's look at the juxtaview:
misty_results_com_path_act %>% plot_interaction_heatmap("juxta.path.130", clean = TRUE)
We observe that TNFa is a significant predictor for adipocytes. We can compare their distributions:
SpatialFeaturePlot(seurat, features = "tnfa", image.alpha = 0) DefaultAssay(seurat) <- "c2l_props" SpatialFeaturePlot(seurat, features = "Adipocytes", image.alpha = 0)
We observe similar distributions for both.
The default model used by MISTy to model each view is the random forest. However, there are different models to choose from, like the faster and more interpretable linear model.
Another option we haven't used yet is bypass.intra
. With this, we bypass training the baseline model that predicts the intraview with features from the intraview itself. We will still be able to see how the other views explain the intraview. We will use the same view composition as before:
run_misty(com_path_act_views, "result/comp_path_act_linear", model.function = linear_model, bypass.intra = TRUE) misty_results_com_path_act_linear <- collect_results("result/comp_path_act_linear")
Let's check again the gain.R2
and view contributions:
misty_results_com_path_act_linear %>% plot_improvement_stats("gain.R2") %>% plot_view_contributions()
For the specific target-predictor interaction, we look again at the juxtaview:
misty_results_com_path_act_linear %>% plot_interaction_heatmap("juxta.path.130", clean = TRUE)
Visualize the activity of the JAK-STAT pathway and myeloid distribution:
SpatialFeaturePlot(seurat, features = "jak.stat", image.alpha = 0) DefaultAssay(seurat) <- "c2l_props" SpatialFeaturePlot(seurat, features = "Myeloid", image.alpha = 0)
In addition to the estimated pathway activities, we can also add a view to examine the relationship between cell-type composition and TF activity. First, we need to estimate the TF activity with decoupler. It is recommended to compute it with Python, as it is significantly faster:
expression_df <- as.data.frame(t(expression))
# In Python: net = dc.get_collectri() acts_tfs= dc.run_ulm( mat = r.expression_df, net = net, verbose = True, use_raw = False, )
The object with the estimation contains two elements: The first are the estimates and their respective p-values can be found in the second element.
est_TF <- py$acts_tfs
To speed up the following model training, we calculate the 1000 most variable genes expressed. We then extract the TF from the highly variable genes to create a MISTy view.
# Highly variable genes hvg <- FindVariableFeatures(expression, selection.method = "vst", nfeatures = 1000) %>% filter(variable == TRUE) hvg_expr <- expression[rownames(hvg), ] # Extract TF from the highly variable genes hvg_TF<- est_TF[[1]][, colnames(est_TF[[1]]) %in% rownames(hvg_expr)]
We will combine the intraview from the cell-type composition and paraviews from the estimated pathway and TF activities:
TF_view <- create_initial_view(hvg_TF) %>% add_paraview(geometry, l = 200) # This may still take some time # Combine Views comp_TF_path_views <- comp_views %>% add_views(create_view("paraview.TF.200", TF_view[["paraview.200"]]$data, "para.TF.200")) %>% add_views(create_view("paraview.path.200", path_act_views[["paraview.200"]]$data, "para.path.200")) # Run Misty run_misty(comp_TF_path_views, "result/comp_TF_path", model.function = linear_model, bypass.intra = TRUE) misty_results_comp_TF_pathway <- collect_results("result/comp_TF_path")
misty_results_comp_TF_pathway %>% plot_improvement_stats("gain.R2") %>% plot_view_contributions()
When plotting the interaction heatmap, we can restrict the result by applying a trim
, that only shows targets above a defined value for a chosen metric like gain.R2
.
misty_results_comp_TF_pathway %>% plot_interaction_heatmap("para.TF.200", clean = TRUE, trim.measure = "gain.R2", trim = 20)
The TF MYC is an important predictor of fibroblasts:
DefaultAssay(seurat) <- "SCT" SpatialFeaturePlot(seurat, features = "MYC", image.alpha = 0) DefaultAssay(seurat) <- "c2l_props" SpatialFeaturePlot(seurat, features = "Fibroblasts", image.alpha = 0)
Indeed, we can see a similar distribution.
Finally, we want to learn about the spatial relationship of receptors and ligands on the tissue slide. We will access the consensus resource from LIANA
after downloading it from Github, pulling out the ligands and receptors from the before-determined highly variable genes:
download.file("https://raw.githubusercontent.com/saezlab/liana-py/main/liana/resource/omni_resource.csv", destfile = "omni_resource.csv", method = "curl") # Ligand Receptor Resource omni_resource <- read_csv("omni_resource.csv")%>% filter(resource == "consensus") # Get highly variable ligands ligands <- omni_resource %>% pull(source_genesymbol) %>% unique() hvg_lig <- hvg_expr[rownames(hvg_expr) %in% ligands,] # Get highly variable receptors receptors <- omni_resource %>% pull(target_genesymbol) %>% unique() hvg_recep <- hvg_expr[rownames(hvg_expr) %in% receptors,] # Clean names rownames(hvg_lig) <- hvg_lig %>% clean_names(parsing_option = 0) %>% rownames(.) rownames(hvg_recep) <- hvg_recep %>% clean_names(parsing_option = 0) %>% rownames(.)
We are going to create a combined view with the receptors in the intraview as targets and the ligands in the paraview as predictors:
# Create views and combine them receptor_view <- create_initial_view(as.data.frame(t(hvg_recep))) ligand_view <- create_initial_view(as.data.frame(t(hvg_lig))) %>% add_paraview(geometry, l = 200, family = "gaussian") lig_recep_view <- receptor_view %>% add_views(create_view("paraview.ligand.200", ligand_view[["paraview.200"]]$data, "para.lig.200")) run_misty(lig_recep_view, "results/lig_recep", bypass.intra = TRUE) misty_results_lig_recep <- collect_results("results/lig_recep")
Let's look at important interactions. An additional way to reduce the number of interactions shown in the heatmap is applying a cutoff
, that introduces an importance threshold:
misty_results_lig_recep %>% plot_interaction_heatmap("para.lig.200", clean = TRUE, cutoff = 2, trim.measure ="gain.R2", trim = 10)
Remember that MISTy does not only infer interactions between ligands and their respective receptor, but rather all possible interactions between ligands and receptors. We can visualize one of the interactions with high importance:
DefaultAssay(seurat) <- "SCT" SpatialFeaturePlot(seurat, features = "CRLF1", image.alpha = 0) SpatialFeaturePlot(seurat, features = "COMP", image.alpha = 0)
The plots show a co-occurrence of the ligand and receptor, although they are not an annotated receptor-ligand pair.
browseVignettes("mistyR")
Here is the output of sessionInfo()
at the point when this document was compiled.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.