scOntoMatch_vignette

knitr::opts_chunk$set(echo = TRUE, dpi=300)

Installation

## install from source
## library(devtools)
## devtools::install_github("YY-SONG0718/scOntoMatch")
library(scOntoMatch)
library(ontologyIndex)

Load data

We use the Tabula Muris and Tabula Sapiens Smartseq-2 lung dataset as example. scOntoMatch works on any number of input datasets. Two demo seurat object are attached in inst/extdata, where we sampled two cells per cell type (original annotation) and focus on the cell type hierarchy in the two datasets.

metadata = '../inst/extdata/metadata.tsv'

anno_col = 'cell_ontology_class'
onto_id_col = 'cell_ontology_id'

obo_file = '../inst/extdata/cl-basic.obo'
propagate_relationships = c('is_a', 'part_of')
ont <- ontologyIndex::get_OBO(obo_file, propagate_relationships = propagate_relationships)

Organize the data name and path as first and second column in a metadata file. Store the seurat object in RDS format and use getSeuratRds to read them in.

obj_list = getSeuratRds(metadata = metadata, sep = "\t")
levels(factor((obj_list$TM_lung@meta.data$cell_ontology_class)))
levels(factor((obj_list$TS_lung@meta.data$cell_ontology_class)))

Match ontology annotation

Trim the ontology tree per dataset

It is common that within each dataset, there will be parent-children relationship between cell types. This is because some cells are able to be further classified into more fine-grained groups, while some other cells are only recognized as the respective parental cell type.

This is not a problem for analyzing individual datasets - we do want to keep those rare, identifiable cell populations distinct. However it could be a problem when we want to map annotation cross-dataset, since it is obscure what population the parent term contains in different datasets.

We provide ontoMultiMinimal for Merging descendant terms to existing ancestor terms in one dataset, to get a minimum ontology representation of the cell type tree.

Note it is optional to trim the ontology tree, and it is always possible to get back to the original annotation later during analysis.

obj_list_minimal = scOntoMatch::ontoMultiMinimal(obj_list = obj_list, ont = ont, anno_col = anno_col, onto_id_col = onto_id_col)

We can see that some cell types in TS_lung cannot match to an ontology term. Consider manual re-annotate. We advise that do always check literature before manual curation and make sure you want the ontology annotation!

obj_list$TS_lung@meta.data[[anno_col]] = as.character(obj_list$TS_lung@meta.data[[anno_col]])

## nk cell can certainly be matched
obj_list$TS_lung@meta.data[which(obj_list$TS_lung@meta.data[[anno_col]] == 'nk cell'), anno_col] = 'natural killer cell'

## there are type 1 and type 2 alveolar fibroblast which both belongs to fibroblast of lung

obj_list$TS_lung@meta.data[which(obj_list$TS_lung@meta.data[[anno_col]] == 'alveolar fibroblast'), anno_col] = 'fibroblast of lung'

## capillary aerocyte is a recently discovered new lung-specific cell type that is good to keep it
## Gillich, A., Zhang, F., Farmer, C.G. et al. Capillary cell-type specialization in the alveolus. Nature 586, 785–789 (2020). https://doi.org/10.1038/s41586-020-2822-7

Now we can trim again

obj_list_minimal = scOntoMatch::ontoMultiMinimal(obj_list = obj_list, ont = ont, anno_col = anno_col, onto_id_col = onto_id_col)

Ontology tree for individual dataset

Functions are provided to plot cell type tree. Before trimming, there are parental-children relationships within both datasets.

plotOntoTree(ont = ont, 
                          onts = names(getOntologyId(obj_list$TM_lung@meta.data[['cell_ontology_class']], ont = ont)), 
                          ont_query = names(getOntologyId(obj_list$TM_lung@meta.data[['cell_ontology_class']], ont = ont)),
                          plot_ancestors = TRUE,  roots = 'CL:0000548',
                          fontsize=25)
plotOntoTree(ont = ont, 
                          onts = names(getOntologyId(obj_list$TS_lung@meta.data[['cell_ontology_class']], ont = ont)), 
                          ont_query = names(getOntologyId(obj_list$TS_lung@meta.data[['cell_ontology_class']], ont = ont)),
                          plot_ancestors = TRUE,  roots = 'CL:0000548',
                          fontsize=25)

After trimming, we get a minimal representation of cell type hierarchy per dataset.

plotOntoTree(ont = ont, 
                          onts = names(getOntologyId(obj_list_minimal$TM_lung@meta.data[['cell_ontology_base']], ont = ont)), 
                          ont_query = names(getOntologyId(obj_list_minimal$TM_lung@meta.data[['cell_ontology_base']], ont = ont)),
                          plot_ancestors = TRUE,  roots = 'CL:0000548',
                          fontsize=25)
plotOntoTree(ont = ont, 
                          onts = names(getOntologyId(obj_list_minimal$TS_lung@meta.data[['cell_ontology_base']], ont = ont)), 
                          ont_query = names(getOntologyId(obj_list_minimal$TS_lung@meta.data[['cell_ontology_base']], ont = ont)),
                          plot_ancestors = TRUE,  roots = 'CL:0000548',
                          fontsize=25)

Now, each cell type in the two datasets is a leaf node in the cell type tree. They are ready to be mapped.

Match ontology annotation cross datasets

The core functionality of scOntoMatch is to find at which layer of cell type hierarchy we get one-to-one matching of cell types across datasets. Key idea is to look at the cell type hierarchies in these datasets together, find the last common ancestor cell types, and merge descendants to ancestors. We provide ontoMultiMatch for this purpose.

## perform ontoMatch on the original tree

obj_list_matched = scOntoMatch::ontoMultiMatch(obj_list = obj_list_minimal, anno_col = 'cell_ontology_base', onto_id_col = onto_id_col, ont = ont)

Finally, we plot a combined cell type tree and highlighting the exixting cell types of each dataset.

plts = plotMatchedOntoTree(ont = ont, obj_list = obj_list_matched,
                                 anno_col = 'cell_ontology_mapped', 
                                 onto_id_col = onto_id_col,
                                 roots = 'CL:0000548', fontsize=25)
plts[[1]]
plts[[2]]

Utility functions

getOntologyId and getOntologyName

getOntologyName(onto_id = c("CL:0000082"), ont = ont)
getOntologyId(obj_list$TM_lung@meta.data[[anno_col]], ont = ont)
sessionInfo()


Try the scOntoMatch package in your browser

Any scripts or data that you put into this service are public.

scOntoMatch documentation built on Nov. 2, 2023, 5:20 p.m.