scCATCH tutorial"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This package is a single cell Cluster-based auto-Annotation Toolkit for Cellular Heterogeneity (scCATCH) from cluster potential marker genes identification to cluster annotation based on evidence-based score by matching the potential marker genes with known cell markers in tissue-specific cell taxonomy reference database (CellMatch).

The scCATCH mainly includes two function findmarkergene and findcelltype to realize the automatic annotation for each identified cluster.

scCATCH can be used to annotate scRNA-seq data from tissue with cancer and without cancer.

General usage

[1] For scRNA-seq data, we suggest to revise the gene symbols with rev_gene(). geneinfo is the system data.frame containing the information of human and mouse from NCBI gene(updated in June. 19, 2022). To use your own geneinfo data.frame, please refer to demo_geneinfo to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.

library(scCATCH)
load(paste0(system.file(package = "scCATCH"), "/extdata/mouse_kidney_203.rda"))

# demo_geneinfo
demo_geneinfo()

# revise gene symbols
mouse_kidney_203 <- rev_gene(data = mouse_kidney_203, data_type = "data", species = "Mouse", geneinfo = geneinfo)

[2] create scCATCH object with createscCATCH(). Users need to provide the normalized data and the cluster for each cell.

obj <- createscCATCH(data = mouse_kidney_203, cluster = mouse_kidney_203_cluster)

[3] find highly expressed genes with findmarkergene(). Users need to provided the speices, tissue, or cancer information. cellmatch is the system data.frame containing the known markers of human and mouse. To use your own marker data.frame, please refer to demo_marker to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc.

# demo_geneinfo
demo_marker()

# find highly expressed genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney")

[4] Evidence-based score and annotation for each cluster with findcelltype()

obj <- findcelltype(object = obj)

# Results is stored in obj
obj@celltype

Note: There two methods to find marker genes. Set use_method 1 to compare with every other cluster and 2 to compare with other clusters together like the strategy in Seurat. Besides, when setting use_method 1, users can set comp_cluster, it represent the number of clusters to compare. Default is to compare all other cluster for each cluster. Set it between 1 and length of unique clusters. More marker genes will be obtained for smaller comp_cluster.

# The most strict condition to identify marker genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch,tissue = "Kidney", use_method = "1")

# The most loose condition to identify marker genes
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "2")

# Other conditions to identify marker genes
obj <- findmarkergene(object = obj,species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "1", comp_cluster = 1)

Moreover, users can adjust the cell_min_pct, logfc, and pvalue to identify the different marker genes.

Custom usage

Users are allowed to use the custom cellmatch for cell type prediction when [1] users want to select different combination of tissues or cancers for annotation; [2] users want to add more marker genes to cellmatch for annotation; [3] users want to use markers from different species other than human and mouse. In this way, please set if_use_custom_marker TRUE in findmarkergene() function and do not need to set species,tissue, and cancer

[1] Different combination of tissues or cancers

# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

# Example
cellmatch_new <- cellmatch[cellmatch$species == "Mouse", ]
cellmatch_new <- cellmatch[cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer") | cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ]
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

[2] Add more marker genes to cellmatch for annotation

# Example

# cellmatch_new is provided by users
# cellmatch_new <- rbind(cellmatch, cellmatch_new)

# Then use the new cellmatch
# a. define the species, tissue, and cancer
obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch_new, tissue = "Kidney")
obj <- findcelltype(obj)

# b. directly use custom cellmatch
obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new)
obj <- findcelltype(obj)

[3] Use markers from different species

# Please refer to demo_marker to build a marker data.frame (new_cellmatch) for another species, e.g., rat
# Then use the new marker
obj <- findmarkergene(object = obj, species = "Rat", if_use_custom_marker = TRUE, marker = cellmatch_new, tissue = "Kidney")
obj <- findcelltype(obj)

About

Please refer to the scCATCH on GitHub for more information. Available tissues and cancers see the wiki page

Cite

Shao et al., scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020. doi: 10.1016/j.isci.2020.100882. PMID:32062421



Try the scCATCH package in your browser

Any scripts or data that you put into this service are public.

scCATCH documentation built on April 23, 2023, 5:09 p.m.