In sqjin/scAI: integrative analysis of single-cell transcriptomic and epigenomic profiles

title: "Walkthrough with kidney dataset" author: "Suoqin Jin, Lihua Zhang" output: html_document mainfont: Arial vignette: > %\VignetteIndexEntry{Integrative analysis of single cell multi-omics data using scAI} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  root.dir = './'
)

This walkthrough will perform integrative analysis of paired single cell RNA-seq and ATAC-seq data of mouse kidney cells, including 8837 co-assayed cells. The data is downloaded from GEO (GSM3271044 and GSM3271045) that were co-assayed using the sci-CAR protocol (Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380-1385 (2018)).

Load the required libraries

library(scAI)
library(dplyr)
library(cowplot)

Load data

The algorithm takes a list of two digital data matrices as input. Genes/loci should be in rows and cells in columns. rownames and colnames should be included. Before running the scAI model, we need to normalize the data to account for library size and select highly variable features. Our preprocessed data matrices after qualify control, normalization and feature selection were provided for this walkthrough.

load("/Users/suoqinjin/Documents/scAI-master/data/data_kidney.Rdata")
X <- data_kidney$data # List of data matrix
labels <- data_kidney$labels # the prior labels of cells

Create a scAI object

scAI_outs <- create_scAIobject(raw.data = X, do.sparse = F)

Preprocess data

Perform quality control to remove low-quality cells and genes, and normalize the data. Since this is a preprocessed data, we set assay = NULL.

scAI_outs <- preprocessing(scAI_outs, assay = NULL)

Add cell information into pData slot of the object

scAI_outs <- addpData(scAI_outs, pdata = labels, pdata.name = "Cell types")

Run scAI model

As depending on the random initilization the results might differ, one run scAI multiple times (e.g. nrun = 5) and output the best result. User can also output results from all runs by setting keep_all = TRUE. One parameter here is the number of components. The selectK function can aid in selecting K. A suitable K is the one at which the magnitude of cophenetic correlation begins to fall. Here, we only run one time.

scAI_outs <- run_scAI(scAI_outs, K = 20, nrun = 1, do.fast = T) # Using Python implementation by setting do.fast = TRUE
fit <- scAI_outs@fit
save(fit, file = "results_scAI_fit.RData") # save the inferred low-dimensional representations

Identify cell clusters

We can also identify cell clusters based on the inferred cell loading matrix using Leiden algorithm.

scAI_outs <- identifyClusters(scAI_outs, resolution = 0.9) 
scAI_outs <- getAggregatedData(scAI_outs, group = scAI_outs@identity)

Visualize cells onto the low-dimensional space

We can visualize cells onto the low-dimensional space generated by t-SNE, FIt-sne or UMAP. Here, we perform UMAP dimension reduction. Cells are colored by the clustering inferred by scAI.

scAI_outs <- reducedDims(scAI_outs, method = "umap")
cellVisualization(scAI_outs, scAI_outs@embed$umap, color.by = "cluster")

Feature plot of the inferred patterns

We can overlay the expression of features, or the cell loading values onto the low-dimensional space, e.g., VscAI, tsne, umap

featureScoreVisualization(scAI_outs, feature.scores = t(scAI_outs@fit$H), feature.use = paste0("factor",c(12,10,5)),  method = "umap", nCol = 3, cell.size = 0.1, show.legend = T, show.legend.combined = F)

Ranking the features (genes/loci) and highlighting the important markers in each factor

feature_genes = c('Slc12a3','Trpm6','Klhl3','Wnk1','Tsc22d1','Cadps2',
                  'Egfem1','Calb1','Kl','Temem72','Dach1','Ptprm','Sgms2','Tox3','Frmpd4',
                  'Rbms3','Fxyd4','Dnm3','Cacnb2','Pde1c','Pde8b')

featureRankingPlot(scAI_outs,assay = 'RNA',  factor.show = c(12,10,5), ncol = 3, feature.show = feature_genes, top.p = 0.25, ylabel = "Gene score")

Identify differentially expressed features of each cell cluster

options(future.globals.maxSize = 10*1024 * 1024^2)
future::plan("multiprocess", workers = 4)
markers.ATAC.cluster <- identifyClusterMarkers(scAI_outs, assay = 'ATAC')

Generate a heatmap to show the expression patterns of top features in cell clusters

n.top = 10
markers.ATAC.clusterTop <- markers.ATAC.cluster %>% group_by(clusters) %>% top_n(n.top, logFC) %>% slice(1:n.top)
featureHeatmap(scAI_outs, assay = "ATAC", feature.use = markers.ATAC.clusterTop$features, group.by = "cluster", size.names = 6)

sqjin/scAI documentation built on Nov. 19, 2020, 4:04 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sqjin/scAI
integrative analysis of single-cell transcriptomic and epigenomic profiles

In sqjin/scAI: integrative analysis of single-cell transcriptomic and epigenomic profiles

Load data

Create a scAI object

Preprocess data

Run scAI model

Identify cell clusters

Visualize cells onto the low-dimensional space

Feature plot of the inferred patterns

Ranking the features (genes/loci) and highlighting the important markers in each factor

Identify differentially expressed features of each cell cluster

Generate a heatmap to show the expression patterns of top features in cell clusters

R Package Documentation

Browse R Packages

We want your feedback!

sqjin/scAI integrative analysis of single-cell transcriptomic and epigenomic profiles

In sqjin/scAI: integrative analysis of single-cell transcriptomic and epigenomic profiles

Load data

Create a scAI object

Preprocess data

Run scAI model

Identify cell clusters

Visualize cells onto the low-dimensional space

Feature plot of the inferred patterns

Ranking the features (genes/loci) and highlighting the important markers in each factor

Identify differentially expressed features of each cell cluster

Generate a heatmap to show the expression patterns of top features in cell clusters

R Package Documentation

Browse R Packages

We want your feedback!

sqjin/scAI
integrative analysis of single-cell transcriptomic and epigenomic profiles