Here we document the steps required to run the CRUK-MI Collateral Vulnerability analysis for Melanoma using the CollateralVulnerability2016 R package available from GitHub.
library(CollateralVulnerability2016) library(dplyr)
work_dir <- '~/BigData/CollateralVulnerability2016/skcm/' download_dir <- '~/BigData/RTCGA_downloads/' mydb <- 'skcm_output.db' mydb_path <- paste0(work_dir, mydb) my_con <- setupSQLite ( mydb_path )
In this case we want all human genes using the most recent ENSEMBL annotation:
all_genes <- getAllHumanGenes(my_con) head(all_genes)
Gene data also written to SQLite database and can be viewed with dplyr:
src_sqlite(my_con@dbname) %>% tbl('human_genes')
Want to import the TCGA RNAseq and mutation data so that it is ready to use for further analyses. This is a wrapper around the RTCGA package so we can specify a different release date if we wish.
rnaseq_dat <- getTCGARNAseqData(my_con, cancerTypes='SKCM', releaseDate = '2015-11-01', sampletag=c('01A', '06A')) dim(rnaseq_dat) mutation_dat <- getTCGAMutationData(my_con, cancerTypes='SKCM', releaseDate = '2015-11-01', sampletag=c('01A', '06A')) dim(mutation_dat) DBI::dbListTables(my_con)
This is a wrapper around the BISEP function from the BISEP package. For testing can set gene_n to a lower number, setting a high number ensures that all genes are included in the analysis. Some filtering is done to exclude non-variant or non-expressed genes. The analysis is parallelised for speed but this has only been tested on Mac OSX
bisep_output <- doBISEPAnalysis(my_con, genes_n=100000)
At this step we want to do an initial filtering of the bisep results, and we then use these genes for the next steps. Here we are filtering on the pi_value (proportion of samples in the low expressors group) and p-value for the bimodal distribution test - see BISEP documentation for more info.
bisep_results <- src_sqlite(my_con@dbname) %>% dplyr::tbl('bisep_results') %>% dplyr::filter(bisep_pval < 0.1 & pi_value <0.2) %>% dplyr::collect() %>% inner_join(all_genes, by=c('gene_name'='gene_id')) %>% dplyr::select(gene_name, gene_name.y, everything()) %>% dplyr::arrange(gene_name.y) bisep_results
doRNAseqPlot(my_con, 'ENSG00000176024')
Work out how many paralogs there are for each human gene in our set.
human_paralog_res <- countHumanParalogs(my_con, bisep_results$gene_name)
Are there lethal variant alleles in D. melanogaster? ``` {r eval = FALSE} flymine_res <- doFlyMineAnalysis(my_con, bisep_results$gene_name)
## Step 9: Do the WormMine analysis Are there lethal variant alleles in C. elegans? ``` {r eval = FALSE} wormmine_res <- doWormMineAnalysis(my_con, bisep_results$gene_name)
What proportion of samples have either protein coding or truncation mutations in the patient cohort? ``` {r eval = FALSE} mut_res <- doMutationAnalysis(my_con, bisep_results$gene_name)
## Step 11: Combine the results Combine the output from all of the previous analyses ``` {r eval=FALSE} combo_res <- combineResults(my_con, bisep_results$gene_name)
Filter the output so that only genes with a small number of family members, and have evidence for lethality in either fly or worm are retained. ``` {r eval=FALSE} filtered_results <- src_sqlite(my_con@dbname) %>% dplyr::tbl('combined_results') %>% dplyr::filter(count_paralogs > 0 & count_paralogs <= 5 & (lethal_pct_fly >= 20 | lethal_pct_worm >= 20) ) %>% dplyr::collect()
## Step 13: Show results in shiny app An interactive version of Step 12 which allows us to view detailed information from the different analyses. ``` {r eval=FALSE} shinyVisApp(my_con)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.