knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) library(rgl) options(rgl.useNULL = TRUE) knitr::knit_hooks$set(webgl = hook_webgl)
Here we demonstrate integrating a scRNA (Dropviz) dataset of the mouse frontal cortex and a spatial transcriptomics (STARmap) dataset of the same region. For this tutorial, we prepared the two datasets at a Dropbox folder
Dropviz_starmap_vig.RDS
is 28,366 genes by 71,639 cells.STARmap_vig.RDS
is 28 genes by 31,294 cells. Dropviz_general_annotations.RDS
.STARmap_3D_Annotations.RDS
The datasets provided are presented in "dgCMatrix" class, which is a common form of sparse matrix in R. We can then create a liger object with a named list of the matrices. The unshared features should not be subsetted out, or submitted separately. Rather, they should be included in the matrix submitted for that dataset. This helps ensure proper normalization. Different from other UINMF tutorials, we set the modality of each dataset with argument modal
, in this way, a slot for spatial coordinates is preserved for the STARmap dataset.
library(rliger) dropviz <- readRDS("starmap_data/Dropviz_starmap_vig.RDS") starmap <- readRDS("starmap_data/STARmap_vig.RDS") lig <- createLiger(list(starmap = starmap, dropviz = dropviz), modal = c("spatial", "rna"))
Liger simply normalizes the matrices by library size, without multiplying a scale factor or applying "log1p" transformation.
lig <- normalize(lig)
Given that the STARmap dataset has only 28 genes available and these are also presented in the Dropviz dataset, we directly set all of them as shared variable genes. Please use varFeatures<-()
method for accessing the shared variable genes. For the Dropviz dataset, we further select variable genes from the unshared part of it. To enable selection of unshared genes, users need to specify the name(s) of dataset(s) where unshared genes should be chosen from to useUnsharedDatasets
.
lig <- selectGenes(lig, thresh = 0.3, useUnsharedDatasets = "dropviz", unsharedThresh = 0.3) varFeatures(lig) <- rownames(dataset(lig, "starmap"))
Then we scale the dataset. Three new matrices will be created under the hook, two containing the 28 shared variable genes for both datasets, and one for the unshared variable genes of the Dropviz data. Note that LIGER does not center the data before scaling because iNMF/UINMF methods require non-negative input.
lig <- scaleNotCenter(lig)
Unshared Integrative Non-negative Matrix Factorization (UINMF) can be applied with runIntegration(..., method = "UINMF")
. A standalone function runUINMF()
is also provided with more detailed documentation and initialization setup. This step produces factor gene loading matrices of the shared genes: $W$ for shared information across datasets, and $V$ for dataset specific information. Specific to UINMF method, additional factor gene loading matrix of unshared features, $U$ is also produced. $H$ matrices, the cell factor loading matrices are produced for each dataset and can be interpreted as low-rank representation of the cells.
In this tutorial, we set dataset specific lambda (regularization parameter) values to penalize the dataset specific effect differently.
Another noteworthy advantage of UINMF is that we are able to use a larger number of factors than there are shared features. We captilize on this by changing the default value of k
to 40.
lig <- runUINMF(lig, k = 40, lambda = c(10, 1))
The default reference dataset for quantile normalization is the larger dataset, but users should select the higher quality dataset as the reference dataset, even if it is the smaller dataset. In this case, the Dropviz dataset is considered higher quality than the STARmap dataset, so we set the Dropviz dataset to be the reference dataset.
After this step, the low-rank cell factor loading matrices, $H$, are aligned and ready for cluster definition.
lig <- quantileNorm(lig, reference = "dropviz")
With the aligned cell factor loading information, we next apply Leiden community detection algorithm to identify cell clusters.
lig <- runCluster(lig)
We create a UMAP with the quantile normalized cell factor loading.
lig <- runUMAP(lig)
Next, we can visualize our factorized object by dataset to check the integration between datasets, and also the cluster labeling on the global population.
plotDatasetDimRed(lig) plotClusterDimRed(lig, legendNCol = 2)
We can also use the Dropviz labels to help us annotate the clusters. Link to the downloading page of the labeling data is provided at top of this page.
mouse_annies <- readRDS("starmap_data/Dropviz_general_annotations.RDS") dropviz_id <- paste0("dropviz_", names(mouse_annies)) cellMeta(lig, "dropviz_ann", cellIdx = dropviz_id) <- factor(mouse_annies) plotClusterDimRed(lig, "dropviz_ann", legendNCol = 1)
cellMeta()<-
method has the feature for partial insertion implemented and is great for adding dataset specific metadata to a new variable. If you are unsure about the order between given variable and the cells in the object, argument cellIdx
can be used for precise locating. Users might use different strategies when creating the index depending on different situation.
Here we demonstrate how to use the annotation labels derived from the above analysis within the context of 3D space. We provide the annotation labels for the sake of simplicity. The link to the download page is provided at the top of this page. These labels were generated using the high quality Dropviz annotations to re-annotate the STARmap cells after completing the above analysis. We have provided the exact annotations and colors used in the publication such that the interested user may captilize on the 3D dimensional sample space.
starmap_annies <- readRDS("starmap_data/STARmap_3D_Annotations.RDS") dplyr::glimpse(starmap_annies)
Recall that we set the "modal" of the STARmap dataset to "spatial" when we created the liger object, here we can insert the coordinate information into the object.
coords <- as.matrix(starmap_annies[, c("Coord1", "Coord2", "Coord3")]) rownames(coords) <- starmap_annies$Cell_Barcode coordinate(lig, "starmap") <- coords
LIGER does not yet have any analysis method that incorporate the spatial coordinate information. Given that most of the spatial transcriptomics technologies only support 2D spatial information, LIGER only has 2D plotting (See plotSpatial2D()
) method implemented for the spatial coordinate at this point.
The STARmap technology we work with in this tutorial is special for 3D information. LIGER does not provide any function for the visualization of 3D coordinates, but here we provide a simple guide on it using package rgl.
The code chunk below shows how to create the 3D view of all cells.
library(rgl) # Set the perspective parameters rgl.viewpoint(theta = 25, phi = 5, zoom = 0.7) # Generate scatter plot in 3D space plot3d(x = starmap_annies$Coord1, y = starmap_annies$Coord2, z = starmap_annies$Coord3, xlab = "", ylab = "", zlab = "", col = starmap_annies$Cell_Type_Color, size = 2, axes= FALSE, labels = FALSE) aspect3d(1.7, 1.4, 0.1) first = c(0) second = c(0) # Set background color to black bg3d("black", labels = FALSE) axes3d(edges = "bbox",col = 'white', labels = FALSE, tick = FALSE)
As we can see above, cells of different types can be located at different depth (z-axis) and can interfere the observation of each other. We next show how to visualize with only neuronal cells.
# Subset annotation data to only Neurons neuronal_cells <- starmap_annies[starmap_annies$General_Class == "Neuron",] # Create 3D scatter plot with the subset plot3d(x = neuronal_cells$Coord1, y = neuronal_cells$Coord2, z = neuronal_cells$Coord3, xlab = "", ylab = "", zlab = "", col = neuronal_cells$Sub_Color, size = 3, axes = FALSE, labels = FALSE) aspect3d(1.7, 1.4, 0.1) first = c(0) second = c(0) bg3d("black", labels = FALSE) axes3d(edges = "bbox", col = 'white', labels = FALSE, tick = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.