knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The GenomeTornadoPlot package allow users to visualize copy number variations (CNVs), and many other types of structure variations as well, which overlap with one or two genes in one chromosomes. For all CNVs overlapping with the target gene a focallity score is also calculated. Higher the focallity score is, more probable it is that the gene is affected by focal events.
In general, we assume that genes with comparably more focal events than broad ones have higher scores. Here we define the standard focallity score by:
where m is the total number of focal variation events, Lmax is the length of longest focal variation event.
To erase the impact of neighbour genes, we implemented another algorithm and call it "edge score". It is defined as:
score.edge = (2*Sgene - Sneighbour_1 -Sneighbour_2)/2
where neighbour 1 and neighbour 2 are neighbour genes of the target gene, if the target gene is at edge of chromosome, the only neighbour gene counts as both neighbour 1 and 2.
Users can choose if they want to calculate the standard or the edge focallity socre. Please notice that the focallity score of each gene is calculated by the data you give.
Prior to installing GenomeTornadoPlot, please install all dependencies as following:
dependencies.packages = c('ggplot2', 'data.table', 'devtools','grid', 'gridExtra','tiff',"shiny","shinydashboard","entropy") install.packages(dependencies.packages) if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c('GenomicRanges','quantsmooth','IRanges','S4Vectors'))
git clone https://github.com/chenhong-dkfz/GenomeTornadoPlot
devtools::install()
Here we just make a brief quick start to the functions in GenomeTornadoPlot
with minimal parameters.
For more information, please check the user manual and package help.
Firstly, you can prepare a BED-like data, and import it to R session. You can generate it from files of other format, such as vcf or maf. Just make sure that column names must contain "Chromosome", "Start", "End", "CN", "Gene", "Cohort" and "PID".
In R, it should be a data frame and it looks like this:
data("cnv_KRAS",package = "GenomeTornadoPlot") knitr::kable(head(cnv_KRAS, 10))
| Chromosome| Start| End| CN|Gene |Cohort |PID | |----------:|--------:|--------:|-----:|:----|:------|:---------| | 12| 29700429| 12145150| 5|KRAS |AML |pid001 | | 12| 21073451| 1777272| 5|KRAS |BRCA |pid002 | | 12| 32285455| 18368484| 5|KRAS |CRC |pid003 | | 12| 24497489| 20635970| 5|KRAS |CRC |pid004 | | 12| 23188787| 31463459| 4|KRAS |AML |pid005 | | 12| 25224933| 7439941| 6|KRAS |CRC |pid006 | | 12| 24801696| 11310196| 5|KRAS |CRC |pid007 | | 12| 24459199| 27108934| 5|KRAS |GLIOMA |pid008 | | 12| 30812917| 17582810| 4|KRAS |BRCA |pid009 | | 12| 21706333| 14115764| 5|KRAS |CRC |pid010 |
The CN column records copy numbers of each CNV event**.
After preparing the data, we can apply functions to our data
### Step 1:
Run the MakeData()
function:
library(GenomeTornadoPlot) data <- MakeData(CNV,gene_name_1,gene_name_2,max.length,score.method,cohort_thredshold)
Here CNV is a BED-like data.frame you just imported.
The other parameters are defined as following:
Here data is an R object containing information of the CNV of selected genes. And it should be input of step 2.
Run the TornadoPlots()
function:
cnv.plot <- TornadoPlots(data, legend, color, color.method, sort.method, SaveAsObject)
MakeData()
function.Here cnv.plot is a list, containing output of plots.
In the first step, if you give only gene_name_1, you will get a standard tornado plot and “dup_del ” plot for this gene after you finish step 2. Otherwise, if you also give the gene_name_2, you will get a “twin” plot and a “mixed” plot.
In order to help users generate genome tornado plots in a convenient way, we provided a shiny app in GenomeTornadoPlot package. Users could launch the shiny app in R console.
runExample()
Here you can simply use the following code to make a tornado plot. Dummy data is attached in the package. The first example is for a single gene.
data("cnv_STK38L", package = "GenomeTornadoPlot") data_genea <- MakeData(CNV=cnv_STK38L,gene_name_1 = "STK38L") plot_genea <- TornadoPlots(data_genea,gene.name="STK38L",sort.method="cohort",SaveAsObject=TRUE)
If what you need is just the focallity score, just use the following command:
data_genea@gene_score
If you want to go further, try printing a standard Genome Tornado Plot:
grid.arrange(plot_genea[[1]])
Colored lines stand for CNV events. In the plot, you can easily find out their locations in chromosomes.
The pie chart stand for the cohort contribution of the events.
The colors in this example stand for cohort. But users can also change parameter and make the color for copy numbers or length.
The score below the graph is the “focallity score” of the gene.
In some cases, a gene plays different roles in different cohorts. A deletion/duplication plot helps identifying that.
grid.arrage(plot_genea[[2]])
Here, the gene of interest is duplicated in most cohorts, whereas deletions are more frequent in some others.
We can also apply GenomeTornadoPlot
for gene pairs.
data("MLLT3_CDKN2A",package = "GenomeTornadoPlot") data_twin <- MakeData(CNV_1=cnv_MLLT3_CDKN2A,gene_name_1 = "MLLT3",gene_name_2="CDKN2A") plot_twin <- TornadoPlots(data_twin,sort.method="cohort",SaveAsObject=T)
Plot twin plot:
grid.arrange(plot_twin[[1]])
In addition, the mixed plot shows the proportion of CNVs which overlap gene 1 alone, gene 2 alone or both genes.
plot mixed plot:
grid.arrange(plot_twin[[2]])
[1]G.R. Bignell, C.D. Greenman, H. Davies, A.P. Butler Signatures of mutation and selection in the cancer genome Nature, 463 (2010), pp. 893-898
[2]M. Bierkens, O. Krijgsman, S.M. Wilting, L. Bosch, A. Jaspers, G.A. Meijer, et al. Focal aberrations indicate EYA2and hsa-miR-375as oncogene and tumor suppressor in cervical carcinogenesis Genes Chromosom. Cancer, 52 (2012), pp. 56-68
[3]C. Garnis, W.W. Lockwood, E. Vucic, Y. Ge, L. Girard, J.D. Minna, et al. High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH Int. J. Cancer, 118 (2005), pp. 1556-1564
[4]R.J. Leary, J.C. Lin, J. Cummins, S. Boca, L.D. Wood, D.W. Parsons, et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers Proc. Natl. Acad. Sci. U. S. A., 105 (2008), pp. 16224-16229
[5]Meuwissen R, Linn SC, Linnoila RI, Zevenhoven J, Mooi WJ, Berns A. Induction of small cell lung cancer by somatic inactivation of both Trp53 and Rb1 in a conditional mouse model. Cancer Cell. 2003;4(3):181–9.
[6]Campbell, P. J. et al. Pan-cancer analysis of whole genomes. bioRxiv (2017).
[7]Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–1120. doi:10.1038/ng.2764
GPL-3.0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.