This script applies the Common Cell type Nomenclature (CCN) to an existing hierarchically-structured taxonomy of cell types, in this case Human Middle Temporal Gyrus (MTG): May 2022. More information about the CCN is available at the Allen Brain Map and in the associated eLife publication. This script requires a taxonomy and some cell meta-data as input and outputs several files useful for publication and for ingest into in-process taxonomy services. Please post any thoughts about the CCN to the Community Forum! This scripts relates to the May 2022 version of the human Middle Temporal Gyrus (MTG) that is used for multiple projects and is available in the Transcriptomics Explorer.
There are two files required as input to this script:
dend.RDS
: this is an R dendrogram
object representing the taxonomy to be annotated. If you used R for cell typing in your manuscript, this is likely a variable that was created at some point during this process and from which your dendrogram images are made. While this assumes a hierarchical structure of the data, additional cell sets of any kind can be made later in the script. Code for converting from other formats to the R dendrogram
format is not provided, but please post to the Community Forum if you have questions about this. metadata.csv
: a table which includes a unique identifier for each cell (in this example it is stored in the sample_name
column) as well as the corresponding cell type from the dendrogram for each cell (in this example it is stored in the cluster_label
column). Additional metadata of any kind can be optionally included in this table. Example files from an upcoming SEA-AD taxonomy at the Allen Institute (~June 2022) are included with this library.
This utilizes a function called apply_CCN
which wraps most of the steps of applying the CCN into a single R function. This function is BETA but should be robust to different input formats, as it get generate the relevant CCN outputs from a dendrogram, from an existing nomenclature table, from a metadata table, or from a named vector of cell type assignments. Other inputs include inputs for most of the other CCN functions. Please post an issue if something doesn't work properly.
The remainder of this script describes how to run the CCN in R. At this point open RStudio (or R) and start running the code below. The first few blocks correspond to housekeeping things needed to get the workspace set up.
# NOTE: REPLACE THIS LINK BELOW WITH YOUR WORKING DIRECTORY outputFolder = paste0(getwd(),"/") setwd(outputFolder) # Needed only if copying and pasting in R knitr::opts_chunk$set(echo = TRUE) # Needed only for RStudio knitr::opts_knit$set(root.dir = outputFolder) # Needed only for RStudio
suppressPackageStartupMessages({ library(CCN) # In this case, other libraries are nested in `apply_CCN` })
As discussed above, we have provided a dendrogram of human MTG cell types (called "dend"), and associated cell metadata (called "metadata") as an example. We read those things in here.
# REPLACE THIS LINE OF CODE WITH CODE TO READ IN YOUR DENDROGRAM, IF NEEDED data(dend) # Example SEA-AD dendrogram dend <- as.dendrogram(dend) # REPLACE THIS LINE OF CODE WITH CODE TO READ IN YOUR DENDROGRAM, IF NEEDED data(metadata) # Example SEA-AD dendrogram # Plot the unannotated dendrogram as a sanity check plot(dend,main="You should see your desired cell type names on the base of this plot")
Many of the variables have reasonable defaults, and for this tutorial we leave them at their default values. More details are available in the "Applying CCN to an existing taxonomy: extended" tutorial.
# Taxonomy ID of format CCN<YYYYMMDD><T> # Note: function will autogenerate a taxonomy ID with current data at T=0, which works if you only generate one taxonomy per day taxonomy_id <- "CCN202204130" # Person uploading the data and taxonomy taxonomy_author <- "Jeremy Miller" # Note that this example taxonomy was created by Nik Jorstad and refined by Kyle Travaglini. # DOI of relavent taxonomy citation, if any taxonomy_citation <- "" # Prefix for cell set label. The automated function works best if you only have one. first_label <- setNames("SEAAD_MTG", 1) # Brain structure. Ontology tag is auto-generated. structure <- "middle temporal gyrus" # Some metadata variables metadata_columns <- c("subclass_label", "class_label") metadata_order <- c("subclass_order", "class_order") cluster_column <- "cluster_label" # Named vector of cell to cell type assignments (can be auto-generated, but better to include) cell_assignment <- setNames(metadata$cluster_label,metadata$sample_name)
This line of code will run all the scripts, output the relevant zip file to nomenclature.zip
in your working directory, and return the same variables for further manipulation in R.
ccn_output <- apply_CCN(dend = dend, cell_assignment = cell_assignment, metadata = metadata, first_label = first_label, taxonomy_id = taxonomy_id, taxonomy_author = taxonomy_author, taxonomy_citation = taxonomy_citation, structure = structure, metadata_columns = metadata_columns, metadata_order = metadata_order, cluster_column = cluster_column) ## Plot the dendrogram in its current state plot_dend(ccn_output$final_dendrogram,node_size = 3)
The final output of the CCN script is a .zip file that contains these files (all described and outputted above):
The above dendrogram looks good, but what if we want to do some additional annotation to the cells sets before finalizing everything? We'll start by extracting the output nomenclature table and then add a few annotations to this. Although not shown here, additional annotations can be done by outputting ccn_output$cell_set_information to a file (or extracting "nomenclature_table.csv" from nomenclature.zip"), manually editing the table in a text editor, and then reading it back into R.
## Add cell set aligned alias terms (in this case, subclass and class calls ARE aligned ## alias terms, so we can add them here). Note that this information could have been ## included directly in with the one-click function, but for demonstration purposes ## They are separated here. cell_set_information <- ccn_output$cell_set_information annotation_columns <- c("cell_set_aligned_alias","cell_set_aligned_alias") append <- TRUE # We want to append the info to cell_set_information <- annotate_nomenclature_from_metadata(cell_set_information, metadata, metadata_columns, metadata_order, annotation_columns, cluster_column, append) # Add node labels corresponding to all IT cells, Non-neuronal, and Non-neural cells cell_set_information[cell_set_information$cell_set_accession=="CS202204130_201","cell_set_preferred_alias"] <- "IT" cell_set_information[cell_set_information$cell_set_accession=="CS202204130_238","cell_set_preferred_alias"] <- "Non-neuronal" cell_set_information[cell_set_information$cell_set_accession=="CS202204130_238","cell_set_aligned_alias"] <- "Non-neuronal" cell_set_information[cell_set_information$cell_set_accession=="CS202204130_248","cell_set_preferred_alias"] <- "Non-neural" cell_set_information[cell_set_information$cell_set_accession=="CS202204130_248","cell_set_aligned_alias"] <- "Non-neural" # Manual annotations of this dendrogram can be done by matching up the node label in # the dendrogram below with the "original_label" column in the nomenclature table plot_dend(ccn_output$initial_dendrogram,node_size = 3)
The same function with the added input highlighted below will regenerate all files and outputs. Note the differences in inputs below.
ccn_output2<- apply_CCN(dend = ccn_output$final_dendrogram, nomenclature = cell_set_information, # NEW LINE ADDED cell_assignment = cell_assignment, first_label = first_label, taxonomy_id = taxonomy_id, taxonomy_author = taxonomy_author, taxonomy_citation = taxonomy_citation, structure = structure) # Note the removed inputs since metadata was previously added ## Plot the dendrogram in its current state plot_dend(ccn_output2$final_dendrogram,node_size = 3)
The resulting .zip file should be shared in manuscripts and other sources as the standard CCN file, for linkage to other taxonomies.
Session info.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.