title: "CELLector: Genomics Guided Selection of Cancer in vitro models"
author: "Hanna Najgebauer and Francesco Iorio"
subtitle: 'R package interactive vignette'
date: "r Sys.Date()
"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
CELLector R package interactive vignette
CELLector manuscript example study case
CELLector is a computational tool assisting experimental scientists in the selection of the most clinically relevant cancer cell lines to be included in a new in-vitro study (or to be considered in a retrospective study), in a patient-genomic guided fashion.
CELLector combines methods from graph theory and market basket analysis; it leverages tumour genomics data to explore, rank, and select optimal cell line models in a user-friendly way, through the CELLector Rshiny App. This enables making appropriate and informed choices about model inclusion/exclusion in retrospective analyses, future studies and it makes possible bridging cancer patient genomics with public available databases from cell line based functional/pharmacogenomic screens, such as CRISPR-cas9 dependency datasets and large-scale in-vitro drug screens.
Furthermore, CELLector includes interface functions to synchronise built-in cell line annotations and genomics data to their latest versions from the Cell Model Passports resource. Through this interface, bioinformaticians can quickly generate binary genomic event matrices (BEMs) accounting for hundreds of cancer cell lines, which can be used in systematic statistical inferences, associating patient-defined cell line subgroups with drug-response/gene-essentiality, for example through GDSC tools.
Additionally, CELLector allows the selection of models within user-defined contexts, for example, by focusing on genomic alterations occurring in biological pathways of interest or considering only predetermined sub-cohorts of cancer patients.
Finally, CELLector identifies combinations of molecular alterations underlying disease subtypes currently lacking representative cell lines, providing guidance for the future development of new cancer models.
License: GNU GPL v3
Najgebauer, H., Yang, M., Francies, H., Pacini, C., Stronach, E. A., Garnett, M. J., Saez-Rodriguez, J., & Iorio, F. CELLector: Genomics Guided Selection of Cancer in vitro Models. https://www.biorxiv.org/content/10.1101/275032v3
CELLector can be used in three different modalities:
(i) as an R package (within R, code available at: https://github.com/francescojm/CELLector),
(ii) as an online R shiny App (available at: https://ot-cellector.shinyapps.io/CELLector_App/),
(iii) running the R shiny App locally (within Rstudio, code available at: https://github.com/francescojm/CELLector_App).
This page contains instruction to quickly try the package. User manual and package documentation are available at https://github.com/francescojm/CELLector/blob/master/CELLector.pdf.
A tutorial on how to use the online Rshiny app (containing also instructions on how to run it locally) is available here
To R package is available on github at https://github.com/francescojm/CELLector. We recommend to use it within Rstudio (https://www.rstudio.com/). To install it the following commands should be executed:
library(devtools) install_github("Francescojm/CELLector") library(CELLector)
This will install the following additional libraries
arules, dplyr, stringr, data.tree, sunburstR, igraph, collapsibleTree, methods
all publicly available on CRAN or Bioconductor. The package comes with built-in data objects containing genomic data for large cohorts of primary tumours and cancer cell lines (from Iorio et al, Cell 2016). Optionally, users can build their own genomic binary event matrices (BEMs) from their data or regenerating the built-in data objects using data from Iorio et al, Cell 2016 but using customised filters for cancer driver genes and variants. Last, cell line BEMs can be reassembled using annotations and data from the Cell Model Passports, through a dedicated module.
In this simple case study scenario, we want to select the 5 most clinically relevant in vitro models that best represent the genomic diversity of TP53 mutated colorectal tumours. The models we wish to select should be microsatellite stable and harbour at least one alteration in the following signalling pathways: PI3K-AKT-MTOR signalling, RAS-RAF-MEK-ERK/JNK signalling and WNT signalling. Finally, we want the model selection to be guided based on somatic mutations and copy number alterations that are observed in at least 3% of the studied TP53 mutant tumour cohort.
After loading the package, we need to load few data objects
library(CELLector) ## Somatic mutations and copy number alterations found in primary tumours and cell lines data(CELLector.PrimTum.BEMs) data(CELLector.CellLine.BEMs) ## Sets of Cancer Functional Events (CFEs: somatic mutations and copy number alterations) involving ## genes in predefined key cancer pathways data(CELLector.Pathway_CFEs) ## Objects used for decoding cna CFEs identifiers data(CELLector.CFEs.CNAid_mapping) data(CELLector.CFEs.CNAid_decode)
Subsequently we specify the Colon/Rectal Adenocarcinoma (COREAD) TCGA label to select the cancer type to analyse. Other available cancer types in this version of the package are: BLCA, BRCA, COREAD, GBM, HNSC, KIRC, LAML, LGG, LUAD, LUSC, OV, PRAD, SKCM, STAD, THCA, UCEC. Finally, we remove possible sample identifier duplications from the primary tumour dataset.
### Change the following two lines to work with a different cancer type tumours_BEM<-CELLector.PrimTum.BEMs$COREAD CELLlineData<-CELLector.CellLine.BEMs$COREAD ### unicize the sample identifiers for the tumour data tumours_BEM<-CELLector.unicizeSamples(tumours_BEM)
At this point, we are ready to build the CELLector searching space, which will contain the most recurrent patients subtypes with matched signatures of cancer functional events (CFEs, defined in Iorio et al, Cell 2016). The value of the arguments of the below function specify that we want to look only at TP53 mutant cancer patients, with alterations in genes belonging to three different cancer pathways.
### building a CELLector searching space focusing on three pathways ### and TP53 mutant patients only CSS<-CELLector.Build_Search_Space(ctumours = t(tumours_BEM), verbose = FALSE, minGlobSupp = 0.03, cancerType = 'COREAD', pathwayFocused = c("RAS-RAF-MEK-ERK / JNK signaling", "PI3K-AKT-MTOR signaling", "WNT signaling"), pathway_CFEs = CELLector.Pathway_CFEs, cnaIdMap = CELLector.CFEs.CNAid_mapping, cnaIdDecode = CELLector.CFEs.CNAid_decode, cdg = CELLector.HCCancerDrivers, subCohortDefinition='TP53')
The searching space is stored in a list, whose element can be inspected as follows:
### visualising the CELLector searching space as a binary tree CSS$TreeRoot ### visualising the first attributes of the tree nodes CSS$navTable[,1:11] ### visualising the sub-cohort of patients whose genome satisfies the rule of the 4th node str_split(CSS$navTable$positivePoints[4],',')
The searching space can be also interactively explored as a collapsible three or a sunburst.
### visualising the CELLector searching space as interactive collapsible binary tree CELLector.visualiseSearchingSpace(searchSpace = CSS,CLdata = CELLlineData)
### visualising the CELLector searching space as interactive sunburst CELLector.visualiseSearchingSpace_sunBurst(searchSpace = CSS)
Finally 10 of the most representative cell lines can be selected with the following command:
### take all the signatures from the searching space Signatures <- CELLector.createAllSignatures(CSS$navTable) ### mapping the cell lines on the CELLector searching space ModelMat<-CELLector.buildModelMatrix(Signatures$ES,CELLlineData,CSS$navTable) ### selecting 10 cell lines selectedCellLines<-CELLector.makeSelection(modelMat = ModelMat, n=10, searchSpace = CSS$navTable) knitr::kable(selectedCellLines,align = 'l')
With CELLector is possible to quantify the quality of each cell line in terms of its ability to represent an entire cohort of considered disease matching patients. This is quantified as a trade-off between two factors. The first factor is the length of the CELLector signatures (in terms of number of composing individual alterations) that are present in the cell line under consideration. This is proportional to the granularity of the representative ability of the cell line, i.e. the longest the signature the more precisely defined is the represented sub-cohort of patients. The second factor is the size of the patient subpopulation represented by the signatures that can be observed in the cell line under consideration, thus accounting for the prevalence of the sub-cohort modeled by that cell line. An input parameter allows for these two factors to be weighted equally or differently.
In this example, we want to score the quality of the cell lines in the colorectal cancer panel, based on the searching space assembled in the previous examples.
### Scoring colorectal cancer cell lines based on the searching space assembled in the previous examples CSscores<-CELLector.Score(NavTab = CSS$navTable,CELLlineData = CELLlineData) ### Visualising the best 3 cell lines according to the considered searching space and criteria knitr::kable(CSscores[1:3,],align = 'l')
In this other example, we want to recompute the cell line scores but prioritising the size of the sub-cohort represented by each cell line over how much it is precisely defined.
### Scoring colorectal cancer cell lines based on the searching space assembled in the previous examples CSscores<-CELLector.Score(NavTab = CSS$navTable,CELLlineData = CELLlineData,alpha = 0.10) ### Visualising the best 3 cell lines according to the considered searching space and criteria knitr::kable(CSscores[1:3,],align = 'l')
In the following example, we show how it is possible to run CELLector analyses and cell line selections from user defined genomic data. To this aim we will first generate genomic binary event matrices (BEMs) for Colorectal Cancer patients and cell lines using somatic variant catalogues derived from the TCGA (as presented in Iorio et al, Cell 2016) and from the latest installment of the Cell Model Passports, respectively for patients and cell lines, pre-selecting sets of cancer driver genes and variants to be considered in these catalogues. Fully user defined genomic variants catalogues can also be used. In this example we will particularly focus on human derived microsatellite stable diploid colorectal carcinoma cell lines resected from male patients.
First we build the BEMs as explained, with the following commands
## loading a set of high-confidence cancer driver genes from Iorio et al, Cell 2016 data(CELLector.HCCancerDrivers) ## loading a set of variants observed it at least two patients in COSMIC data(CELLector.RecfiltVariants) ## Assembling a genomic binary event matrix (BEM) for human derived microsatellite stable diploid colorectal carcinoma cell lines, ## resected from male patients using genomic data from the Cell Model Passports, considering only variants observed in COSMIC in at lest two patients ## in high-confidence cancer driver genes COREAD_cl_BEM <- CELLector.CELLline_buildBEM( Tissue='Large Intestine', Cancer_Type = 'Colorectal Carcinoma', msi_status_select = 'MSI', ploidy_th = c(2,2), GenesToConsider = CELLector.HCCancerDrivers, VariantsToConsider = CELLector.RecfiltVariants) ## showing first entries of the BEM head(COREAD_cl_BEM) ## bar diagram with mutation frequencies in the BEM for 20 top frequently mutated genes barplot(sort(colSums(COREAD_cl_BEM[,3:ncol(COREAD_cl_BEM)]),decreasing=TRUE)[1:20], las=2,ylab='n. mutated cell lines')
## Assembling BEM for Colorectal Adenocarcinoma (COAD/READ) primary tumours using data from the TCGA as presented in Iorio et al 2016, ## considering only variants observed in COSMIC in at lest two patients in high-confidence cancer driver genes COREAD_tum_BEM<- CELLector.Tumours_buildBEM(Cancer_Type = 'COAD/READ',GenesToConsider = CELLector.HCCancerDrivers, VariantsToConsider = CELLector.RecfiltVariants) ## showing first 25 entries of the BEM COREAD_tum_BEM[1:5,1:5] ## showing a bar diagram with mutation frequency of 30 top frequently altered genes barplot(100*sort(rowSums(COREAD_tum_BEM), decreasing=TRUE)[1:30]/ncol(COREAD_tum_BEM), las=2,ylab='% patients')
Next, we assemble a CELLector searching space and select representative cell lines as for the previous examples.
### Assembling the searching space and visualising it as interactive sunburst COREAD_tum_BEM<-CELLector.unicizeSamples(COREAD_tum_BEM) CSS<-CELLector.Build_Search_Space(ctumours = t(COREAD_tum_BEM), verbose = FALSE, minGlobSupp = 0.02, cancerType = 'COREAD') CELLector.visualiseSearchingSpace_sunBurst(CSS)
### take all the signatures from the searching space Signatures <- CELLector.createAllSignatures(CSS$navTable) ### mapping the cell lines on the CELLector searching space ModelMat<-CELLector.buildModelMatrix(Signatures$ES,COREAD_cl_BEM,CSS$navTable) ### selecting 5 representative cell lines selectedCellLines<-CELLector.makeSelection(modelMat = ModelMat, n=5, searchSpace = CSS$navTable) knitr::kable(selectedCellLines,align = 'l')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.