knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This package is a single cell Cluster-based auto-Annotation Toolkit for Cellular Heterogeneity (scCATCH) from cluster potential marker genes identification to cluster annotation based on evidence-based score by matching the potential marker genes with known cell markers in tissue-specific cell taxonomy reference database (CellMatch).
The scCATCH mainly includes two function findmarkergene
and findcelltype
to realize the automatic annotation for each identified cluster.
scCATCH can be used to annotate scRNA-seq data from tissue with cancer and without cancer.
[1] For scRNA-seq data, we suggest to revise the gene symbols with rev_gene()
. geneinfo
is the system data.frame containing the information of human and mouse from NCBI gene(updated in June. 19, 2022). To use your own geneinfo
data.frame, please refer to demo_geneinfo
to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.
library(scCATCH) load(paste0(system.file(package = "scCATCH"), "/extdata/mouse_kidney_203.rda")) # demo_geneinfo demo_geneinfo() # revise gene symbols mouse_kidney_203 <- rev_gene(data = mouse_kidney_203, data_type = "data", species = "Mouse", geneinfo = geneinfo)
[2] create scCATCH object with createscCATCH()
. Users need to provide the normalized data and the cluster for each cell.
obj <- createscCATCH(data = mouse_kidney_203, cluster = mouse_kidney_203_cluster)
[3] find highly expressed genes with findmarkergene()
. Users need to provided the speices, tissue, or cancer information. cellmatch
is the system data.frame containing the known markers of human and mouse. To use your own marker data.frame, please refer to demo_marker
to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.
# demo_geneinfo demo_marker() # find highly expressed genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney")
[4] Evidence-based score and annotation for each cluster with findcelltype()
obj <- findcelltype(object = obj) # Results is stored in obj obj@celltype
Note: There two methods to find marker genes. Set use_method
1
to compare with every other cluster and 2
to compare with other clusters together like the strategy in Seurat
. Besides, when setting use_method
1
, users can set comp_cluster
, it represent the number of clusters to compare. Default is to compare all other cluster for each cluster. Set it between 1 and length of unique clusters. More marker genes will be obtained for smaller comp_cluster
.
# The most strict condition to identify marker genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch,tissue = "Kidney", use_method = "1") # The most loose condition to identify marker genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "2") # Other conditions to identify marker genes obj <- findmarkergene(object = obj,species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "1", comp_cluster = 1)
Moreover, users can adjust the cell_min_pct
, logfc
, and pvalue
to identify the different marker genes.
Users are allowed to use the custom cellmatch
for cell type prediction when [1] users want to select different combination of tissues or cancers for annotation; [2] users want to add more marker genes to cellmatch
for annotation; [3] users want to use markers from different species other than human and mouse.
In this way, please set if_use_custom_marker
TRUE
in findmarkergene()
function and do not need to set species
,tissue
, and cancer
[1] Different combination of tissues or cancers
# Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) # Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) # Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse", ] cellmatch_new <- cellmatch[cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer") | cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj)
[2] Add more marker genes to cellmatch
for annotation
# Example # cellmatch_new is provided by users # cellmatch_new <- rbind(cellmatch, cellmatch_new) # Then use the new cellmatch # a. define the species, tissue, and cancer obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch_new, tissue = "Kidney") obj <- findcelltype(obj) # b. directly use custom cellmatch obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj)
[3] Use markers from different species
# Please refer to demo_marker to build a marker data.frame (new_cellmatch) for another species, e.g., rat # Then use the new marker obj <- findmarkergene(object = obj, species = "Rat", if_use_custom_marker = TRUE, marker = cellmatch_new, tissue = "Kidney") obj <- findcelltype(obj)
Please refer to the scCATCH on GitHub for more information. Available tissues and cancers see the wiki page
Shao et al., scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020. doi: 10.1016/j.isci.2020.100882. PMID:32062421
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.