knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

Here we integrate the scATAC and scRNA reads from the dual-omics dataset SNARE-seq as an illustration of how UINMF can be used to integrate cross-modality data. For this tutorial, we will use three matrices, which can all be downloaded from our Dropbox folder

Step 1: Load data

library(rliger)

rna <- readRDS("SNAREseq_data/SNAREseq_RNA.RDS")
atac_shared <- readRDS("SNAREseq_data/SNAREseq_chromatin_accessibility_shared.RDS")
atac_unshared <- readRDS("SNAREseq_data/SNARE_seq_unshared_chromatin_features.RDS")

Step 2: Preprocessing

Integrative non-negative matrix factorization with unshared features (UINMF) performs factorization using shared feature matrix of all datasets and includes unshared features from dataset(s) with such information. In this tutorial, we plan to use the variable feature selected from the scRNA dataset. Since we prepared the gene-centric matrix from the scATAC dataset, these genes can be accounted as the shared features. For the unshared features, normally we select variable genes that are out of the intersection of gene sets from all datasets. In this tutorial, we directly use the peaks that are identified variable. Therefore, special processing will be needed compared to the other UINMF tutorials.

2.1: Create liger object for shared genes

lig <- createLiger(list(rna = rna, atac = atac_shared))

2.2: Normalize and select shared variable features

lig <- normalize(lig) %>%
    selectGenes(useDatasets = "rna", thresh = 0.1) %>%
    scaleNotCenter()

Step 3: Selecting the unshared features

When selecting unshared features for the UINMF integration, it is critical to consider the type of data you are working with. For unshared features that gene-centric, the user should follow the feature selection process outlined in the 'Integrating unshared features with UINMF' tutorial.

However, when dealing with features that are not gene-centric (as is the case with our binned intergenic peak counts), we must select the most variable features using an appropriate method. Here, we opt to use Seurat's FindVariableFeatures() function.

First, we normalize the unshared features:

unshareNormed <- normalize(atac_unshared)

Then we select the top 2,000 variable features with Seurat VST method:

library(Seurat)

se <- CreateSeuratObject(unshareNormed)
se <- FindVariableFeatures(se, selection.method = "vst", nfeatures = 2000)
top2000 <- VariableFeatures(se)

Then we scale, but do not center the unshared features

unshareScaled <- scaleNotCenter(unshareNormed[top2000,])

Add the unshared features that have been properly selected, such that they are added as a genes by cells matrix.

varUnsharedFeatures(lig, "atac") <- top2000
scaleUnsharedData(lig, "atac") <- unshareScaled

Step 4: Joint Matrix Factorization

To factorize the datasets and include the unshared datasets, set useUnshared = TRUE.

lig <- runUINMF(lig, k = 30, nIteration = 30)

Step 5: Quantile Normalization and Joint Clustering

After factorization, the resulting Liger object can be used in all downstream LIGER functions without adjustment. The default reference dataset for quantile normalization is the larger dataset, but the user should select the higher quality dataset as the reference dataset, even if it is the smaller dataset.

lig <- quantileNorm(lig)
lig <- runCluster(lig, resolution = 0.8, nNeighbors = 30)

Step 6: Visualizations and Downstream processing

lig <- runUMAP(lig, nNeighbors = 30)

Next, we can visualize our returned factorized object by dataset to check the alignment between datasets, as well as by cluster determined in the factorization.

options(ligerDotSize = 0.5)
plotDatasetDimRed(lig)
plotClusterDimRed(lig, legendNCol = 2)


MacoskoLab/liger documentation built on Jan. 19, 2025, 8:22 a.m.