ASURAT is a single-cell RNA sequencing (scRNA-seq) data analysis pipeline, developed for simultaneously clustering cells and biological interpretation.
Introduction, documentation, and tutorial can be found at
https://keita-iida.github.io/ASURAT/index.html
Use the devtools::install_github() command
devtools::install_github('johannesnicolaus/ASURAT_source')
Although the above URL does not assume Seurat-based analyses, it is beneficial to begin with a Seurat object obj
including obj@assays[["RNA"]]@counts
data.
Load a Seurat object for human scRNA-seq data (below is an example).
cerv_seurat <- readRDS(file = "backup/cerv_small_seurat.rds") # Seurat object
Below are stopgap installations. See Chapter 1 for all the requirements.
Note that users need to replace org.Hs.eg.db
with other packages when analyzing other animal's scRNA-seq data.
library(tidyverse) # For efficient handling of data.frame
library(org.Hs.eg.db) # For using human genome annotation package
library(Seurat) # For using Seurat
source("R/function_general.R") # ASURAT's function
Create an ASURAT object.
cerv <- make_asurat_obj(mat = cerv_seurat@assays[["RNA"]]@counts,
obj_name = "cerv_small")
Convert gene symbols into Entrez IDs by using org.Hs.eg.db
package.
dictionary <- AnnotationDbi::select(org.Hs.eg.db,
key = cerv[["variable"]][["symbol"]],
columns = c("ENTREZID"), keytype = "SYMBOL")
dictionary <- dictionary[!duplicated(dictionary$SYMBOL), ]
names(dictionary) <- c("symbol", "entrez")
cerv[["variable"]] <- dictionary
The following function log1p_data()
performs log transform of the input data with a pseudo count eps
.
log1p_data <- function(obj, eps){
obj[["history"]][["log1p_data"]][["eps"]] <- eps
mat <- as.matrix(obj[["data"]][["raw"]])
lmat <- log(mat + eps)
obj[["data"]][["log1p"]] <- as.data.frame(lmat)
return(obj)
}
cerv <- log1p_data(obj = cerv, eps = 1)
The following function centralize_data()
centralizes the input data on a gene-by-gene basis.
centralize_data <- function(obj){
mat <- as.matrix(obj[["data"]][["log1p"]])
cmat <- sweep(mat, 1, apply(mat, 1, mean), FUN = "-")
obj[["data"]][["centered"]] <- as.data.frame(cmat)
return(obj)
}
cerv <- centralize_data(obj = cerv)
The following function do_cor_variables()
computes a correlation matrix from the input data.
Users can choose a measure of correlation coefficient by setting method
(vector form is also accepted but not recommended due to the file size) such as pearson
, spearman
, and kendall
.
do_cor_variables <- function(obj, method){
res <- list()
tmat <- t(obj[["data"]][["log1p"]])
for(m in method){
res <- c(res, list(cor(tmat, method = m)))
}
names(res) <- method
return(res)
}
cerv_cor <- do_cor_variables(obj = cerv, method = c("spearman"))
Save the objects. Please note that the suffixes of the following filenames, such as 09
and 005
, are only for identifying the computational steps (there is no special significance).
saveRDS(cerv, file = "backup/09_005_cerv_correlation.rds")
saveRDS(cerv_cor, file = "backup/09_006_cerv_correlation.rds")
Go to Chapter 8 for analyses using Disease Ontology database.
Go to Chapter 9 for analyses using Cell Ontology database.
Go to Chapter 10 for analyses using Gene Ontology database.
Go to Chapter 11 for analyses using Kyoto Encyclopedia of Genes and Genomes (KEGG) database.
Go to Chapter 12 for analyses using Reactome database.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.