knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

GenomeTornadoPlot

The GenomeTornadoPlot package allow users to visualize copy number variations (CNVs), and many other types of structure variations as well, which overlap with one or two genes in one chromosomes. For all CNVs overlapping with the target gene a focallity score is also calculated. Higher the focallity score is, more probable it is that the gene is affected by focal events.

Algorithms of focallity score

In general, we assume that genes with comparably more focal events than broad ones have higher scores. Here we define the standard focallity score by:

S =  $\sum_{i = 1}^{m} log(L_{max}-L_i)$

where m is the total number of focal variation events, Lmax is the length of longest focal variation event.

To erase the impact of neighbour genes, we implemented another algorithm and call it "edge score". It is defined as:

score.edge = (2*Sgene - Sneighbour_1 -Sneighbour_2)/2

where neighbour 1 and neighbour 2 are neighbour genes of the target gene, if the target gene is at edge of chromosome, the only neighbour gene counts as both neighbour 1 and 2.

Users can choose if they want to calculate the standard or the edge focallity socre. Please notice that the focallity score of each gene is calculated by the data you give.

Download and installation

Prior to installing GenomeTornadoPlot, please install all dependencies as following:

dependencies.packages = c('ggplot2', 'data.table', 'devtools','grid', 'gridExtra','tiff',"shiny","shinydashboard","entropy")

install.packages(dependencies.packages)

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c('GenomicRanges','quantsmooth','IRanges','S4Vectors'))

Installing

  1. In the Git repository click on "Clone or Download".
  2. Copy the HTTPS link.
  3. Open a terminal and type or paste:
git clone https://github.com/chenhong-dkfz/GenomeTornadoPlot
  1. Open the folder GenomeTornadoPlot and open the “GenomeTornadoPlot.Rproj” file in RStudio.
  2. In the RStudio console, type:
devtools::install()

Quick Start

Here we just make a brief quick start to the functions in GenomeTornadoPlot with minimal parameters. For more information, please check the user manual and package help.

step 0:

Firstly, you can prepare a BED-like data, and import it to R session. You can generate it from files of other format, such as vcf or maf. Just make sure that column names must contain "Chromosome", "Start", "End", "CN", "Gene", "Cohort" and "PID".

In R, it should be a data frame and it looks like this:

data("cnv_KRAS",package = "GenomeTornadoPlot")
knitr::kable(head(cnv_KRAS, 10))

| Chromosome| Start| End| CN|Gene |Cohort |PID | |----------:|--------:|--------:|-----:|:----|:------|:---------| | 12| 29700429| 12145150| 5|KRAS |AML |pid001 | | 12| 21073451| 1777272| 5|KRAS |BRCA |pid002 | | 12| 32285455| 18368484| 5|KRAS |CRC |pid003 | | 12| 24497489| 20635970| 5|KRAS |CRC |pid004 | | 12| 23188787| 31463459| 4|KRAS |AML |pid005 | | 12| 25224933| 7439941| 6|KRAS |CRC |pid006 | | 12| 24801696| 11310196| 5|KRAS |CRC |pid007 | | 12| 24459199| 27108934| 5|KRAS |GLIOMA |pid008 | | 12| 30812917| 17582810| 4|KRAS |BRCA |pid009 | | 12| 21706333| 14115764| 5|KRAS |CRC |pid010 |

The CN column records copy numbers of each CNV event**.

After preparing the data, we can apply functions to our data

### Step 1: Run the MakeData() function:

library(GenomeTornadoPlot)
data <- MakeData(CNV,gene_name_1,gene_name_2,max.length,score.method,cohort_thredshold)

Here CNV is a BED-like data.frame you just imported.

The other parameters are defined as following:

  1. gene_name_1: the name of the first gene.
  2. gene_name_2: the name of the second gene (optional).
  3. max.length: the maximum length of events, defaut is 1e7.
  4. score.method: the method of calculating focallity score, default is edge.
  5. cohort_thredshold: the threshold of cohorts, if the size of cohort is below the threshold, it would be collected into "others" group, default is 0.

Here data is an R object containing information of the CNV of selected genes. And it should be input of step 2.

Step 2:

Run the TornadoPlots() function:

cnv.plot <- TornadoPlots(data, legend, color, color.method, sort.method, SaveAsObject)
  1. data: R object generated by MakeData() function.
  2. legend: could be set to “pie”(default) or "barplot" (optional).
  3. color: a vector of CNV colors, optional.
  4. color.method: how to color the CNVs. It could be “cohort”(default) or “ploidy”(optional).
  5. sort.method: how to sort the CNVs. It could be “length”(defult), "cohort" or "ploidy" (optional).

Here cnv.plot is a list, containing output of plots.

In the first step, if you give only gene_name_1, you will get a standard tornado plot and “dup_del ” plot for this gene after you finish step 2. Otherwise, if you also give the gene_name_2, you will get a “twin” plot and a “mixed” plot.

GenomeTornadoPlot Easy2Use (shinyapp)

In order to help users generate genome tornado plots in a convenient way, we provided a shiny app in GenomeTornadoPlot package. Users could launch the shiny app in R console.

runExample()

Example

Here you can simply use the following code to make a tornado plot. Dummy data is attached in the package. The first example is for a single gene.

data("cnv_STK38L", package = "GenomeTornadoPlot")
data_genea <-  MakeData(CNV=cnv_STK38L,gene_name_1 = "STK38L")
plot_genea <- TornadoPlots(data_genea,gene.name="STK38L",sort.method="cohort",SaveAsObject=TRUE)

If what you need is just the focallity score, just use the following command:

data_genea@gene_score

If you want to go further, try printing a standard Genome Tornado Plot:

grid.arrange(plot_genea[[1]])

Colored lines stand for CNV events. In the plot, you can easily find out their locations in chromosomes. The pie chart stand for the cohort contribution of the events. The colors in this example stand for cohort. But users can also change parameter and make the color for copy numbers or length.
The score below the graph is the “focallity score” of the gene.

In some cases, a gene plays different roles in different cohorts. A deletion/duplication plot helps identifying that.

grid.arrage(plot_genea[[2]])

Here, the gene of interest is duplicated in most cohorts, whereas deletions are more frequent in some others.

We can also apply GenomeTornadoPlot for gene pairs.

data("MLLT3_CDKN2A",package = "GenomeTornadoPlot")
data_twin <-  MakeData(CNV_1=cnv_MLLT3_CDKN2A,gene_name_1 = "MLLT3",gene_name_2="CDKN2A")
plot_twin <- TornadoPlots(data_twin,sort.method="cohort",SaveAsObject=T)

Plot twin plot:

grid.arrange(plot_twin[[1]])

In addition, the mixed plot shows the proportion of CNVs which overlap gene 1 alone, gene 2 alone or both genes.
plot mixed plot:

grid.arrange(plot_twin[[2]])

References

[1]G.R. Bignell, C.D. Greenman, H. Davies, A.P. Butler Signatures of mutation and selection in the cancer genome Nature, 463 (2010), pp. 893-898

[2]M. Bierkens, O. Krijgsman, S.M. Wilting, L. Bosch, A. Jaspers, G.A. Meijer, et al. Focal aberrations indicate EYA2and hsa-miR-375as oncogene and tumor suppressor in cervical carcinogenesis Genes Chromosom. Cancer, 52 (2012), pp. 56-68

[3]C. Garnis, W.W. Lockwood, E. Vucic, Y. Ge, L. Girard, J.D. Minna, et al. High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH Int. J. Cancer, 118 (2005), pp. 1556-1564

[4]R.J. Leary, J.C. Lin, J. Cummins, S. Boca, L.D. Wood, D.W. Parsons, et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers Proc. Natl. Acad. Sci. U. S. A., 105 (2008), pp. 16224-16229

[5]Meuwissen R, Linn SC, Linnoila RI, Zevenhoven J, Mooi WJ, Berns A. Induction of small cell lung cancer by somatic inactivation of both Trp53 and Rb1 in a conditional mouse model. Cancer Cell. 2003;4(3):181–9.

[6]Campbell, P. J. et al. Pan-cancer analysis of whole genomes. bioRxiv (2017).

[7]Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–1120. doi:10.1038/ng.2764

Licence

GPL-3.0



chenhong-dkfz/tornado.test.1 documentation built on Dec. 28, 2021, 7:28 p.m.