The TCGA tumor types cover a collection of anatomical compartments. Organizing tumor types into groups of related compartments may be fruitful. We will use the oncotree OBO representation from an NCI thesaurus OBO distribution in the Bioc 3.9 version of ontoProc.
This table was constructed by hand on Oct 10 2019 using materials in ontoProc package.
suppressPackageStartupMessages({ library(DT) library(ontoProc) library(magrittr) library(dplyr) library(BiocOncoTK) otree = getOncotreeOnto() }) data("map_tcga_ncit") datatable(map_tcga_ncit)
We will drop the CNTL class, and use only the first NCIT mapping when two seem to match.
controlindex = which(map_tcga_ncit[,1]=="CNTL") tcgacodes = map_tcga_ncit[-controlindex,1] ncitsites = map_tcga_ncit[-controlindex,3] ssi = strsplit(ncitsites, "\\|") sites = sapply(ssi, "[", 1) simpmap = data.frame(code=tcgacodes, oncotr_site=otree$name[sites], ncit=sites, stringsAsFactors=FALSE) simpmap[sample(seq_len(nrow(simpmap)),5),]
We now have a 1-1 mapping from TCGA code to NCIT site. These sites can be grouped according to organ system, using the knowledge that NCIT:C3263 is the 'neoplasm by site' (which really should be 'system') category.
poss_sys = otree$children["NCIT:C3263"][[1]] # all possible systems allanc = otree$ancestors[simpmap$ncit] specific = sapply(allanc, function(x) intersect(x, poss_sys)[1]) # ignore multiplicities sys = unlist(otree$name[specific]) datatable(systab <- cbind(simpmap, sys=sys))
Neither thymoma nor mesothelioma have NCIT organ system mappings per se.
We now have 12 categories for 33 tumor types. A code pattern for finding the TCGA codes for a given system is:
systab %>% filter(grepl("Repro", sys))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.