library(CollateralVulnerability2016)
library(dplyr)

Introduction

Step 1: Set up SQLite database and directories

work_dir <- '~/BigData/CollateralVulnerability2016/luad/'
download_dir <- '~/BigData/RTCGA_downloads/'
mydb <- 'luad_output.db'
mydb_path <- paste0(work_dir, mydb)
my_con <- setupSQLite ( mydb_path ) 

Step 2: Get human geneset to work with

In this case we want all human genes using the most recent ENSEMBL annotation:

all_genes <- getAllHumanGenes(my_con)
head(all_genes)

Gene data also written to SQLite database and can be viewed with dplyr:

src_sqlite(my_con@dbname) %>% tbl('human_genes')

Step 3: Import TCGA RNAseq and mutation data

Want to import the TCGA RNAseq and mutation data so that it is ready to use for further analyses

rnaseq_dat <- getTCGARNAseqData(my_con, cancerTypes='LUAD', releaseDate = '2015-11-01', sampletag=c('01A', '06A'))
dim(rnaseq_dat)

mutation_dat <- getTCGAMutationData(my_con, cancerTypes='LUAD', releaseDate = '2015-11-01', sampletag=c('01A', '06A'))
dim(mutation_dat)

DBI::dbListTables(my_con)

Step 4: Run the BISEP low expressors analysis on RNAseq data

This is a wrapper around the BISEP function from the BISEP package. For testing can set gene_n to a lower number, setting a high number ensures that all genes are included in the analysis. Some filtering is done to exclude non-variant or non-expressed genes: either by variance or by number of genes with an NA. Can optionally log2 transform the data or even use a non-standard input table. The analysis is parallelised for speed but this has only been tested on Mac OSX and Linux. Set a seed to get consistent results between runs. See ?doBISEPAnalysis for more details.

bisep_output <- doBISEPAnalysis(my_con, genes_n=100000, table_name = "tcga_rnaseq_data", 
                                log2_transform=TRUE, gene_var_th=0.2, gene_naprop_th=0.3, seed_val=38297)

Step 5: View the results of the low expressors analysis

At this step we can optionally do an initial filtering of the bisep results use the plotBISEPOutput function to visualise this. We then use these genes for the next steps. Here we are visualising based on the pi_value (proportion of samples in the low expressors group) and p-value for the bimodal distribution test - see BISEP documentation for more info. However, we take all genes forward for the later steps so that we have data for all genes for all analyses.

plotBISEPOutput(my_con, pi_value_th=0.2, bisep_pval_th = 0.1)

bisep_results <- src_sqlite(my_con@dbname) %>%
    dplyr::tbl('bisep_results') %>%
   # dplyr::filter(bisep_pval < 0.1 & pi_value <0.2) %>% 
    dplyr::collect()  %>%
    dplyr::arrange(gene_name)
bisep_results

Step 6: Generate a plot for a given gene

doRNAseqPlot(my_con, 'ENSG00000127616')

Step 7: Do the human paralogue analysis

human_paralog_res <- countHumanParalogs(my_con, bisep_results$gene_id)

Step 8: Do the FlyMine analysis

``` {r eval = FALSE} flymine_res <- doFlyMineAnalysis(my_con, bisep_results$gene_id)

## Step 9: Do the WormMine analysis
``` {r eval = FALSE}
wormmine_res <- doWormMineAnalysis(my_con, bisep_results$gene_id)

Step 10: Do the mutation analysis

``` {r eval = FALSE} mut_res <- doMutationAnalysis(my_con, bisep_results$gene_id)

## Step 11: Combine the results
``` {r eval=FALSE}
combo_res <- combineResults(my_con, bisep_results$gene_id)

Step 12: Filter the results

``` {r eval=FALSE} filtered_results <- src_sqlite(my_con@dbname) %>% dplyr::tbl('combined_results') %>% dplyr::filter(bisep_pval <= 0.1, pi_value <= 0.2, count_paralogs > 0, count_paralogs <= 5, (lethal_pct_fly >= 20 | lethal_pct_worm >= 20) ) %>% dplyr::collect()

## Step 13: Show results in shiny app
``` {r eval=FALSE}
    shinyVisApp(my_con)


chapmandu2/CollateralVulnerability2016 documentation built on May 13, 2019, 3:27 p.m.