library(CancerCellLines)
library(dplyr)

Introduction

This Vignette follows on from the Overview vignette and assumes that the user has already set up the SQLite database containing at least the CCLE data - this vignette won't work from the toy database!

Setup

Connect to the database and generate SQLiteConnection and dplyr connection objects for convenience.

dbpath <- '~/BigData/CellLineData/CancerCellLines.db'
#dbpath <- system.file('extdata/toy.db', package="CancerCellLines")
full_con <- setupSQLite(dbpath)
dplyr_con <- src_sqlite(full_con@dbname)

Example 1: Melanoma heatmap with MEK and BRAF inhibitors

We are interested in looking at some important melanoma genes and compounds that act through them We can use the dplyr interface to easily populate a cell line vector with all of the melanoma cell lines.

    #specify the genes
    ex1_genes <- c('BRAF', 'NRAS', 'CRAF', 'TP53')

    #get the melanoma cell lines
    ex1_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% dplyr::filter(Site_primary=='skin') %>%
       collect %>% as.data.frame
    ex1_cell_lines <- ex1_cell_lines$CCLE_name
    ex1_cell_lines[1:10]

    #get BRAF and MEK inhibitors
    ex1_drugs <- c('AZD6244','PLX4720','PD-0325901')

Next we can make data frames for the genes, drugs and cell lines that we're interested int:

    #make a tall frame
    ex1_tall_df <- makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines, ex1_drugs)
    ex1_tall_df

    #convert this into a wide data frame
    ex1_wide_df <- ex1_tall_df %>% makeWideFromTallDataFrame
    ex1_wide_df

    #compare the drug activities
    pairs(~AZD6244_resp+PLX4720_resp+`PD-0325901_resp`, ex1_wide_df)

Whilst the wide data frame is useful for modelling, it's the tall data frame that is more useful for plotting since it's in a tidy format (long and thin). Let's make a heatmap using the built in plotHeatmap function:

    #make a heatmap!
    plotHeatmap(ex1_tall_df)

Cell lines are plotted as rows and features as columns. The response data is always plotted to the left, with the most sensitive cell lines at the bottom in green, and the least sensitive at the top in red. Affy and copy number data is plotted from blue (low) to red (high) whilst mutation data is plotted as light colours for wild type and dark colours for mutant.

We also have some degree of control over the order of the x and y axes. For example, if we want the cell lines to be ordered on the response to PLX4720, we can specify this:

    plotHeatmap(ex1_tall_df, order_feature='PLX4720_resp')

Example 2: EGFR inhibitors vs EGFR mutation status or expression

This time we are interested in how the expression and mutation status of EGFR interacts with the response to the EGFR inhibitor, erlotinib. Let's dive right in using the makeRespVsGeneticDataFrame function to make a data frame suitable for the plotRespVsGeneticHist and plotRespVsGeneticScatter functions:

    #get all cell lines
    ex2_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% 
       collect %>% as.data.frame
    ex2_cell_lines <- ex2_cell_lines$CCLE_name

    #make a data frame for the affy analysis
    df <- makeRespVsGeneticDataFrame(full_con, gene='EGFR',
                               cell_lines=ex2_cell_lines,
                               drug='Erlotinib',
                               data_types = 'affy',
                               drug_df = NULL) 

    #scatter plot of EGFR expression vs Erlotinib response
    plotRespVsGeneticHist(df, 'affy', FALSE)

    #histogram of Erlotinib response coloured by EGFR expression
    plotRespVsGeneticPoint(df, 'affy', FALSE)

Example 3: BRAF inhibitors vs BRAF mutation status

Now let's do a similar analysis with PLX4720 and BRAF mutation status:

    #make a data frame for the affy analysis
    df <- makeRespVsGeneticDataFrame(full_con, gene='BRAF',
                               cell_lines=ex2_cell_lines,
                               drug='PLX4720',
                               data_types = 'hybcap',
                               drug_df = NULL) 

    #scatter plot of EGFR expression vs Erlotinib response
    plotRespVsGeneticHist(df, 'hybcap', FALSE)

    #histogram of Erlotinib response coloured by EGFR expression
    plotRespVsGeneticPoint(df, 'hybcap', FALSE)

Example 4: Comparing SMARCA4 expression in SMARCA4 mutated cell lines to wildtype

The GeneticVsGenetic suite of functions and plots allows genetic features to be compared against eachother, rather than against a response variable. For example, looking at SMARCA4 in lung cancer:

    #get lung cell lines
    ex4_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% filter(Site_primary == 'lung') %>%
       collect %>% as.data.frame
    ex4_cell_lines <- ex4_cell_lines$CCLE_name

    #make the data frame
    gvg.df <- makeGeneticVsGeneticDataFrame(full_con, 
                                            cell_lines=ex4_cell_lines,
                                            gene1='SMARCA4',
                                            data_type1='hybcap',
                                            gene2='SMARCA4',
                                            data_type2='affy') 

    #view the data frame
    head(gvg.df)

    #do the plot
    plotGeneticVsGeneticPoint(gvg.df)

    #all in one go with axes swapped
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticPoint()

    #two continuous
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticPoint()

    #two discrete
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='hybcap',
                                            gene2='KRAS', data_type2='hybcap') %>% plotGeneticVsGeneticPoint()

    #also plot by cell line with one feature a y axis and another as fill colour
    #continous + discrete
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticHist()

    #continous + continous
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines[1:25], gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticHist(label_option = TRUE)

Interactive visualisations

There are also a series of shiny__ functions for interactive visualisations. The response data can be fed into the shinyRespVsGeneticApp function as follows:

    data(dietlein_data)

    full_con <- setupSQLite('~/BigData/CellLineData/CancerCellLines.db')
    shinyRespVsGeneticApp(con=full_con, drug_df=dietlein_data)

Alternatively, if a custom dataset isn't defined CCLE will be used:

    shinyRespVsGeneticApp(con=full_con)

If you are just interested in the GeneticVsGenetic analysis functions then you can launch the shinyGeneticVsGenetic shiny app:

    shinyGeneticVsGeneticApp(con=full_con)    

Example 5: Comparing response values

Text

    #get all cell lines
    ex5_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% 
       collect %>% as.data.frame
    ex5_cell_lines <- ex5_cell_lines$CCLE_name

    #make a data frame
    df <- makeRespVsRespDataFrame(full_con, 
                               cell_lines=ex5_cell_lines,
                               drugs=c('Erlotinib', 'AZD6244'),
                               tissue_info = 'ccle')
    head(df)

    #makes a wide data frame
    wide.df <- df %>% makeWideFromRespVsRespDataFrame()
    head(wide.df)

    #now do some plots
    plotRespVsRespWaterfall(filter(df, grepl('Erlotinib', assayed_id)))
    plotRespVsRespDensity(df)
    plotRespVsRespPairs(df)

Also a shiny app:

    shinyRespVsRespApp(con=full_con)

Future directions

To do:
- GeneticVsGenetic - RespVsResp

Session Info

   sessionInfo() 


chapmandu2/CancerCellLines documentation built on May 13, 2019, 3:27 p.m.