Introduction

Here we document the steps required to run the CRUK-MI Collateral Vulnerability analysis for Melanoma using the CollateralVulnerability2016 R package available from GitHub.

library(CollateralVulnerability2016)
library(dplyr)

Step 1: Set up SQLite database and directories

work_dir <- '~/BigData/CollateralVulnerability2016/skcm/'
download_dir <- '~/BigData/RTCGA_downloads/'
mydb <- 'skcm_output.db'
mydb_path <- paste0(work_dir, mydb)
my_con <- setupSQLite ( mydb_path ) 

Step 2: Get human geneset to work with

In this case we want all human genes using the most recent ENSEMBL annotation:

all_genes <- getAllHumanGenes(my_con)
head(all_genes)

Gene data also written to SQLite database and can be viewed with dplyr:

src_sqlite(my_con@dbname) %>% tbl('human_genes')

Step 3: Import TCGA RNAseq and mutation data

Want to import the TCGA RNAseq and mutation data so that it is ready to use for further analyses. This is a wrapper around the RTCGA package so we can specify a different release date if we wish.

rnaseq_dat <- getTCGARNAseqData(my_con, cancerTypes='SKCM', releaseDate = '2015-11-01', sampletag=c('01A', '06A'))
dim(rnaseq_dat)

mutation_dat <- getTCGAMutationData(my_con, cancerTypes='SKCM', releaseDate = '2015-11-01', sampletag=c('01A', '06A'))
dim(mutation_dat)

DBI::dbListTables(my_con)

Step 4: Run the low expressors analysis on RNAseq data

This is a wrapper around the BISEP function from the BISEP package. For testing can set gene_n to a lower number, setting a high number ensures that all genes are included in the analysis. Some filtering is done to exclude non-variant or non-expressed genes. The analysis is parallelised for speed but this has only been tested on Mac OSX

bisep_output <- doBISEPAnalysis(my_con, genes_n=100000)

Step 5: View the results of the low expressors analysis

At this step we want to do an initial filtering of the bisep results, and we then use these genes for the next steps. Here we are filtering on the pi_value (proportion of samples in the low expressors group) and p-value for the bimodal distribution test - see BISEP documentation for more info.

bisep_results <- src_sqlite(my_con@dbname) %>%
    dplyr::tbl('bisep_results') %>%
    dplyr::filter(bisep_pval < 0.1 & pi_value <0.2) %>% 
    dplyr::collect() %>% 
    inner_join(all_genes, by=c('gene_name'='gene_id')) %>%
    dplyr::select(gene_name, gene_name.y, everything()) %>%
    dplyr::arrange(gene_name.y)
bisep_results

Step 6: Generate a plot for a given gene

doRNAseqPlot(my_con, 'ENSG00000176024')

Step 7: Do the human paralogue analysis

Work out how many paralogs there are for each human gene in our set.

human_paralog_res <- countHumanParalogs(my_con, bisep_results$gene_name)

Step 8: Do the FlyMine analysis

Are there lethal variant alleles in D. melanogaster? ``` {r eval = FALSE} flymine_res <- doFlyMineAnalysis(my_con, bisep_results$gene_name)

## Step 9: Do the WormMine analysis
Are there lethal variant alleles in C. elegans?
``` {r eval = FALSE}
wormmine_res <- doWormMineAnalysis(my_con, bisep_results$gene_name)

Step 10: Do the mutation analysis

What proportion of samples have either protein coding or truncation mutations in the patient cohort? ``` {r eval = FALSE} mut_res <- doMutationAnalysis(my_con, bisep_results$gene_name)

## Step 11: Combine the results
Combine the output from all of the previous analyses
``` {r eval=FALSE}
combo_res <- combineResults(my_con, bisep_results$gene_name)

Step 12: Filter the results

Filter the output so that only genes with a small number of family members, and have evidence for lethality in either fly or worm are retained. ``` {r eval=FALSE} filtered_results <- src_sqlite(my_con@dbname) %>% dplyr::tbl('combined_results') %>% dplyr::filter(count_paralogs > 0 & count_paralogs <= 5 & (lethal_pct_fly >= 20 | lethal_pct_worm >= 20) ) %>% dplyr::collect()

## Step 13: Show results in shiny app
An interactive version of Step 12 which allows us to view detailed information from the different analyses.
``` {r eval=FALSE}
    shinyVisApp(my_con)


chapmandu2/CollateralVulnerability2016 documentation built on May 13, 2019, 3:27 p.m.