Essentiality with continuous expression: Breast CCND1 expression example

 

DOWNLOAD R CODE FILE FOR THIS TUTORIAL

 

We previously worked through a basic 2 class differential essentiality analysis using cell line HER2+ status as an example. Here we see how to predict genes whose essentiality is associated (positively or negatively) with a continuous variable, such as RNA-Seq expression of a gene. We'll use the expression of well-known Breast oncogene CCND1 as a case study.

First DOWNLOAD SIMEM R CODE zip bundle containing the R code files loaded below.

  Also note that, while the code below uses relative paths, you should edit these to point to where you've stored the code, annotation or data files.

### To install required packages, uncomment and run this
# source("http://www.bioconductor.org/biocLite.R")
# biocLite(c("Biobase", "preprocessCore", "genefilter"))
# install.packages(c("blme", "doParallel", "ggplot2", "locfit", "MASS", "plyr", "reshape"))

suppressPackageStartupMessages(library("Biobase"))
suppressPackageStartupMessages(library("blme"))
suppressPackageStartupMessages(library("doParallel"))
suppressPackageStartupMessages(library("genefilter"))
suppressPackageStartupMessages(library("ggplot2"))
suppressPackageStartupMessages(library("locfit"))
suppressPackageStartupMessages(library("MASS"))
suppressPackageStartupMessages(library("plyr"))
suppressPackageStartupMessages(library("preprocessCore"))
suppressPackageStartupMessages(library("reshape"))


source("../../R/data_format_lib.R")
source("../../R/model_lib.R")
source("../../R/simem_lib.R")

 

The screen data formatted as an ExpressionSet, including measurement weights, can be downloaded HERE.

load("../../data/shrna/breast_screens_with_weights.eset")
#breast_screens

 

The reagent annotation table we're about to load, including the weight column which will be used to exclude some reagents, can be downloaded HERE.

hp = read.delim("../../data/annotations/hairpin_annotations.txt", header=T, as.is=T, check.names=F)
hpWeights = hp[,c("trcn_id", "gene_id", "weight")]

 

The breast RNA-Seq expression file, from which we'll extract CCND1 log2-FPKM expression values, can be downloaded HERE.

options(width=100)

rnaseq = read.delim("../../data/rnaseq/breast_rnaseq_qn.txt", header=T, as.is=T, check.names=F)
# Get the row index for CCND1 expression
ccnd1Index = which(rnaseq$symbol == "CCND1")
# Remove all identifier columns, and drop the columns dimension, resulting in a cell-line-named vector of log2-FPKM values
ccnd1Expr = unlist(rnaseq[ccnd1Index,-grep("gene_id|symbol|ensembl_id", colnames(rnaseq))])
ccnd1Expr

 

Once we have the CCND1 log2-FPKM expression values, we want to construct a data frame with 2 columns: cell_line as the 1st column, and a second column to contain the expression values, which we'll creatively name ccnd1. We will then specify the covariate = "ccnd1" and annotationsPerCellLine = ccnd1Values parameters in the simem() function.

ccnd1Values = cbind.data.frame(cell_line=names(ccnd1Expr),
                               ccnd1=ccnd1Expr,
                               stringsAsFactors=FALSE)
ccnd1Values[1:5,]

 

For the purposes of this example, to reduce computation time, we'll perform the analysis for CCND1, which is expected, based on the breast cancer literature, to be more essential in cell lines with higher CCND1 expression. We'll also examine CDK4 and CDK6 essentiality, since these are known to directly form protein-protein complexes with CCND1 and are known to be required for CCND1's cell cycle functions in different contexts.

genesOfInterest = c(595,   #CCND1
                    1019,  #CDK4
                    1021)  #CDK6

 

If you want to perform a genome-wide analysis, simply omit the geneIds = genesOfInterest parameter. As noted, this analysis typically takes ~18 hours but can be greatly reduced using the parallelNodes parameter on systems with multiple processor cores.

 

If we specify an analysis using both precision and signal-noise measurement weights (inverseVarianceWeights = TRUE and signalProbWeights = TRUE parameters, respectively), we must ensure that we've added these weights to the ExpressionSet beforehand (DETAILED HERE).

 

We're now ready to predict genes whose essentiality is significantly associated with CCND1 log2-FPKM values.

results = simem(screens = breast_screens,
                geneIds = genesOfInterest,
                covariate = "ccnd1",
                reagentWeights = hpWeights,
                annotationsPerCellLine = ccnd1Values,
                inverseVarianceWeights = TRUE,
                signalProbWeights = TRUE,
                analyzeReagents = TRUE,
                parallelNodes = 1
                )

 

Here are the gene-level summaries (more detailed gene-level results can be obtained using results$gene_detailed). As expected, CCND1 and CDK4 show strongly significant increases in essentiality in cell lines with higher CCND1 expression. This is consistent with the knockdown of either member of the CCND1-CDK4 complex disrupting the function of the complex. CDK6 is not found to be significantly associated with CCND1 expression, suggesting that in Breast Cancer contexts, CDK4 rather than CDK6 is the primary complex partner for CCND1.

 

The difference parameter can be interpreted as the magnitude of the average change in dropout slope associated with each unit increase in log2-FPKM CCND1 expression. In other words, on average the dropout is faster (more negative), indicating increased essentiality, in cell lines with higher CCND1 expression.

options(width=100)

results$gene

 

Since analyzeReagents = TRUE parameter was specified, per-reagent context-specific essentiality predictions are also available (more detailed gene-level results can be obtained using results$reagent_detailed)

options(width=100)
reagent = results$reagent
reagent[order(reagent$symbol, reagent$trcn_id), ]

 

The above analysis tests whether the essentiality of each specified gene is associated with CCND1 expression. In the next section we will see how to test whether the essentiality of a gene is significantly associated with that gene's own mRNA expression level.



neellab/simem documentation built on May 23, 2019, 1:31 p.m.