knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
Unshared integrative non-negative matrix factorization (UINMF) [Kriebel and Welch, 2022] is an extensive algorithm of our original iNMF algorithm [Welch J D, 2019]. It allows integrating datasets using shared and unshared features at the same time. We demonstrated in the publication that:
For this tutorial, we prepared example input data which can be found in the Dropbox folder
OSMFISH.vin.RDS
, 33 genes by 5,219 cells, in dense matrix form, rom the mouse somatosensory cortexDROPVIZ.vin.RDS
, 29,463 genes by 15,000 cells, from the frontal cortex. The prepared RDS files can be directly loaded into the R environment. They are presented in dense matrix (R default) form. We recommend converting them to sparse matrix form ("dgCMatrix" class supported by "Matrix" package) in order to reduce memory usage and speed up downstream analysis.
Next, create your Liger object, submitting the datasets in list format. The unshared features should not be subsetted out, or submitted separately. Rather, they should be included in the matrix submitted for that dataset. For example,the scRNA-seq data is submitted in its entirety, the unshared features are not submitted separately. This helps ensure proper normalization.
library(rliger) library(Matrix) options(ligerDotSize = 0.5) osmFISH <- readRDS("uinmf_vignette_data/OSMFISH.vin.RDS") osmFISH <- as(osmFISH, "CsparseMatrix") rna <- readRDS("uinmf_vignette_data/DROPVIZ.vin.RDS") lig <- createLiger(list(osmFISH = osmFISH, rna = rna))
The normalization is applied to the datasets in their entirety.
lig <- normalize(lig)
The normalized data for each dataset can be accessed with normData()
.
Next, we select variable genes to be used for integration. To include unshared features in your analysis, set the useUnsharedDatasets
with a vector of names of the datasets where unshared features should be included. For instance, in this case, we wish to include the unshared features from the RNA dataset, the second dataset, in our analysis. We provide an individual tuning threshold for unshared features. If a single value is submitted, that threshold is applied to all datasets' unshared features. If multiple datasets include unshared features, the user can specify an individual thresholds for each dataset by submitting a list of thresholds the same length as the number of datasets with unshared datasets.
lig <- selectGenes(lig, useUnsharedDatasets = "rna", unsharedThresh = 0.4)
The selected shared features, all 33 osmFISH genes in this case, can be accessed with varFeatures(lig)
. The unshared variable features of a specific dataset, namingly "rna"
in this case, can be accessed with varUnsharedFeatures(lig, "rna")
.
We scale the selected gene expression matrix before the integration. The datasets are not centered before scaling so the non-negative constraint for the factorization is addressed.
lig <- scaleNotCenter(lig)
The scaled matrix for shared features can be accessed with scaleData()
. The scaled matrix of the unshared features of the RNA dataset is a separate matrix and can be accessed with scaleUnsharedData(lig, "rna")
.
We provide function runIntegration()
for a general wrapper of all integration method we developed. Users can do runIntegration(..., method = "UINMF")
to perform the analysis. Alternatively, stand-alone function runUINMF()
is also provided with more specific parameter setitng guide.
lig <- runIntegration(lig, k = 30, method = "UINMF")
After factorization, the resulting liger object can be used in all downstream LIGER functions without adjustment. The default reference dataset for quantile normalization is the larger dataset, but the user should select the higher quality dataset as the reference dataset, even if it is the smaller dataset.
lig <- quantileNorm(lig, reference = "rna")
With the quantile normalized cell factor loading, we can then apply Leiden graph-based community detection method on it.
lig <- runCluster(lig)
lig <- runUMAP(lig)
Next, we can visualize our returned factorized object by dataset to check the alignment between datasets, as well as by cluster determined in the factorization.
plotDatasetDimRed(lig) plotClusterDimRed(lig, legendNCol = 2)
We can also examine features such as gene expression.
pdgfra <- plotGeneDimRed(lig, "Pdgfra", splitBy = "dataset", title = names(lig)) bmp4 <- plotGeneDimRed(lig, "Bmp4", splitBy = "dataset", title = names(lig)) cowplot::plot_grid( pdgfra$Pdgfra.rna, pdgfra$Pdgfra.osmFISH, bmp4$Bmp4.rna, bmp4$Bmp4.osmFISH )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.