library(CancerCellLines) library(dplyr)
This Vignette follows on from the Overview vignette and assumes that the user has already set up the SQLite database containing at least the CCLE data - this vignette won't work from the toy database!
Connect to the database and generate SQLiteConnection and dplyr connection objects for convenience.
dbpath <- '~/BigData/CellLineData/CancerCellLines.db' #dbpath <- system.file('extdata/toy.db', package="CancerCellLines") full_con <- setupSQLite(dbpath) dplyr_con <- src_sqlite(full_con@dbname)
We are interested in looking at some important melanoma genes and compounds that act through them We can use the dplyr interface to easily populate a cell line vector with all of the melanoma cell lines.
#specify the genes ex1_genes <- c('BRAF', 'NRAS', 'CRAF', 'TP53') #get the melanoma cell lines ex1_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% dplyr::filter(Site_primary=='skin') %>% collect %>% as.data.frame ex1_cell_lines <- ex1_cell_lines$CCLE_name ex1_cell_lines[1:10] #get BRAF and MEK inhibitors ex1_drugs <- c('AZD6244','PLX4720','PD-0325901')
Next we can make data frames for the genes, drugs and cell lines that we're interested int:
#make a tall frame ex1_tall_df <- makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines, ex1_drugs) ex1_tall_df #convert this into a wide data frame ex1_wide_df <- ex1_tall_df %>% makeWideFromTallDataFrame ex1_wide_df #compare the drug activities pairs(~AZD6244_resp+PLX4720_resp+`PD-0325901_resp`, ex1_wide_df)
Whilst the wide data frame is useful for modelling, it's the tall data frame that is more useful for plotting since it's in a tidy format (long and thin). Let's make a heatmap using the built in plotHeatmap
function:
#make a heatmap! plotHeatmap(ex1_tall_df)
Cell lines are plotted as rows and features as columns. The response data is always plotted to the left, with the most sensitive cell lines at the bottom in green, and the least sensitive at the top in red. Affy and copy number data is plotted from blue (low) to red (high) whilst mutation data is plotted as light colours for wild type and dark colours for mutant.
We also have some degree of control over the order of the x and y axes. For example, if we want the cell lines to be ordered on the response to PLX4720, we can specify this:
plotHeatmap(ex1_tall_df, order_feature='PLX4720_resp')
This time we are interested in how the expression and mutation status of EGFR interacts with the response to the EGFR inhibitor, erlotinib. Let's dive right in using the makeRespVsGeneticDataFrame
function to make a data frame suitable for the plotRespVsGeneticHist
and plotRespVsGeneticScatter
functions:
#get all cell lines ex2_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% collect %>% as.data.frame ex2_cell_lines <- ex2_cell_lines$CCLE_name #make a data frame for the affy analysis df <- makeRespVsGeneticDataFrame(full_con, gene='EGFR', cell_lines=ex2_cell_lines, drug='Erlotinib', data_types = 'affy', drug_df = NULL) #scatter plot of EGFR expression vs Erlotinib response plotRespVsGeneticHist(df, 'affy', FALSE) #histogram of Erlotinib response coloured by EGFR expression plotRespVsGeneticPoint(df, 'affy', FALSE)
Now let's do a similar analysis with PLX4720 and BRAF mutation status:
#make a data frame for the affy analysis df <- makeRespVsGeneticDataFrame(full_con, gene='BRAF', cell_lines=ex2_cell_lines, drug='PLX4720', data_types = 'hybcap', drug_df = NULL) #scatter plot of EGFR expression vs Erlotinib response plotRespVsGeneticHist(df, 'hybcap', FALSE) #histogram of Erlotinib response coloured by EGFR expression plotRespVsGeneticPoint(df, 'hybcap', FALSE)
The GeneticVsGenetic suite of functions and plots allows genetic features to be compared against eachother, rather than against a response variable. For example, looking at SMARCA4 in lung cancer:
#get lung cell lines ex4_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% filter(Site_primary == 'lung') %>% collect %>% as.data.frame ex4_cell_lines <- ex4_cell_lines$CCLE_name #make the data frame gvg.df <- makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='hybcap', gene2='SMARCA4', data_type2='affy') #view the data frame head(gvg.df) #do the plot plotGeneticVsGeneticPoint(gvg.df) #all in one go with axes swapped makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy', gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticPoint() #two continuous makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy', gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticPoint() #two discrete makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='hybcap', gene2='KRAS', data_type2='hybcap') %>% plotGeneticVsGeneticPoint() #also plot by cell line with one feature a y axis and another as fill colour #continous + discrete makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy', gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticHist() #continous + continous makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines[1:25], gene1='SMARCA4', data_type1='affy', gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticHist(label_option = TRUE)
There are also a series of shiny__ functions for interactive visualisations. The response data can be fed into the shinyRespVsGeneticApp function as follows:
data(dietlein_data) full_con <- setupSQLite('~/BigData/CellLineData/CancerCellLines.db') shinyRespVsGeneticApp(con=full_con, drug_df=dietlein_data)
Alternatively, if a custom dataset isn't defined CCLE will be used:
shinyRespVsGeneticApp(con=full_con)
If you are just interested in the GeneticVsGenetic analysis functions then you can launch the shinyGeneticVsGenetic
shiny app:
shinyGeneticVsGeneticApp(con=full_con)
Text
#get all cell lines ex5_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% collect %>% as.data.frame ex5_cell_lines <- ex5_cell_lines$CCLE_name #make a data frame df <- makeRespVsRespDataFrame(full_con, cell_lines=ex5_cell_lines, drugs=c('Erlotinib', 'AZD6244'), tissue_info = 'ccle') head(df) #makes a wide data frame wide.df <- df %>% makeWideFromRespVsRespDataFrame() head(wide.df) #now do some plots plotRespVsRespWaterfall(filter(df, grepl('Erlotinib', assayed_id))) plotRespVsRespDensity(df) plotRespVsRespPairs(df)
Also a shiny app:
shinyRespVsRespApp(con=full_con)
To do:
- GeneticVsGenetic
- RespVsResp
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.