knitr::opts_chunk$set(echo = TRUE, dpi=300)
## install from source ## library(devtools) ## devtools::install_github("YY-SONG0718/scOntoMatch") library(scOntoMatch) library(ontologyIndex)
We use the Tabula Muris and Tabula Sapiens Smartseq-2 lung dataset as example. scOntoMatch
works on any number of input datasets.
Two demo seurat object are attached in inst/extdata, where we sampled two cells per cell type (original annotation) and focus on the cell type hierarchy in the two datasets.
metadata = '../inst/extdata/metadata.tsv' anno_col = 'cell_ontology_class' onto_id_col = 'cell_ontology_id' obo_file = '../inst/extdata/cl-basic.obo' propagate_relationships = c('is_a', 'part_of') ont <- ontologyIndex::get_OBO(obo_file, propagate_relationships = propagate_relationships)
Organize the data name
and path
as first and second column in a metadata file.
Store the seurat object in RDS format and use getSeuratRds
to read them in.
obj_list = getSeuratRds(metadata = metadata, sep = "\t")
levels(factor((obj_list$TM_lung@meta.data$cell_ontology_class))) levels(factor((obj_list$TS_lung@meta.data$cell_ontology_class)))
It is common that within each dataset, there will be parent-children relationship between cell types. This is because some cells are able to be further classified into more fine-grained groups, while some other cells are only recognized as the respective parental cell type.
This is not a problem for analyzing individual datasets - we do want to keep those rare, identifiable cell populations distinct. However it could be a problem when we want to map annotation cross-dataset, since it is obscure what population the parent term contains in different datasets.
We provide ontoMultiMinimal
for Merging descendant terms to existing ancestor terms in one dataset, to get a minimum ontology representation of the cell type tree.
Note it is optional to trim the ontology tree, and it is always possible to get back to the original annotation later during analysis.
obj_list_minimal = scOntoMatch::ontoMultiMinimal(obj_list = obj_list, ont = ont, anno_col = anno_col, onto_id_col = onto_id_col)
We can see that some cell types in TS_lung cannot match to an ontology term. Consider manual re-annotate. We advise that do always check literature before manual curation and make sure you want the ontology annotation!
obj_list$TS_lung@meta.data[[anno_col]] = as.character(obj_list$TS_lung@meta.data[[anno_col]]) ## nk cell can certainly be matched obj_list$TS_lung@meta.data[which(obj_list$TS_lung@meta.data[[anno_col]] == 'nk cell'), anno_col] = 'natural killer cell' ## there are type 1 and type 2 alveolar fibroblast which both belongs to fibroblast of lung obj_list$TS_lung@meta.data[which(obj_list$TS_lung@meta.data[[anno_col]] == 'alveolar fibroblast'), anno_col] = 'fibroblast of lung' ## capillary aerocyte is a recently discovered new lung-specific cell type that is good to keep it ## Gillich, A., Zhang, F., Farmer, C.G. et al. Capillary cell-type specialization in the alveolus. Nature 586, 785–789 (2020). https://doi.org/10.1038/s41586-020-2822-7
Now we can trim again
obj_list_minimal = scOntoMatch::ontoMultiMinimal(obj_list = obj_list, ont = ont, anno_col = anno_col, onto_id_col = onto_id_col)
Functions are provided to plot cell type tree. Before trimming, there are parental-children relationships within both datasets.
plotOntoTree(ont = ont, onts = names(getOntologyId(obj_list$TM_lung@meta.data[['cell_ontology_class']], ont = ont)), ont_query = names(getOntologyId(obj_list$TM_lung@meta.data[['cell_ontology_class']], ont = ont)), plot_ancestors = TRUE, roots = 'CL:0000548', fontsize=25)
plotOntoTree(ont = ont, onts = names(getOntologyId(obj_list$TS_lung@meta.data[['cell_ontology_class']], ont = ont)), ont_query = names(getOntologyId(obj_list$TS_lung@meta.data[['cell_ontology_class']], ont = ont)), plot_ancestors = TRUE, roots = 'CL:0000548', fontsize=25)
After trimming, we get a minimal representation of cell type hierarchy per dataset.
plotOntoTree(ont = ont, onts = names(getOntologyId(obj_list_minimal$TM_lung@meta.data[['cell_ontology_base']], ont = ont)), ont_query = names(getOntologyId(obj_list_minimal$TM_lung@meta.data[['cell_ontology_base']], ont = ont)), plot_ancestors = TRUE, roots = 'CL:0000548', fontsize=25)
plotOntoTree(ont = ont, onts = names(getOntologyId(obj_list_minimal$TS_lung@meta.data[['cell_ontology_base']], ont = ont)), ont_query = names(getOntologyId(obj_list_minimal$TS_lung@meta.data[['cell_ontology_base']], ont = ont)), plot_ancestors = TRUE, roots = 'CL:0000548', fontsize=25)
Now, each cell type in the two datasets is a leaf node in the cell type tree. They are ready to be mapped.
The core functionality of scOntoMatch
is to find at which layer of cell type hierarchy we get one-to-one matching of cell types across datasets. Key idea is to look at the cell type hierarchies in these datasets together, find the last common ancestor cell types, and merge descendants to ancestors. We provide ontoMultiMatch
for this purpose.
## perform ontoMatch on the original tree obj_list_matched = scOntoMatch::ontoMultiMatch(obj_list = obj_list_minimal, anno_col = 'cell_ontology_base', onto_id_col = onto_id_col, ont = ont)
Finally, we plot a combined cell type tree and highlighting the exixting cell types of each dataset.
plts = plotMatchedOntoTree(ont = ont, obj_list = obj_list_matched, anno_col = 'cell_ontology_mapped', onto_id_col = onto_id_col, roots = 'CL:0000548', fontsize=25)
plts[[1]]
plts[[2]]
getOntologyId and getOntologyName
getOntologyName(onto_id = c("CL:0000082"), ont = ont)
getOntologyId(obj_list$TM_lung@meta.data[[anno_col]], ont = ont)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.