2 class context-specific essentiality: Breast HER2+ example

 

DOWNLOAD R CODE FILE FOR THIS TUTORIAL

 

First DOWNLOAD SIMEM R CODE zip bundle containing the R code files loaded below.

 

Also note that, while the code below uses relative paths, you should edit these to point to where you've stored the code, annotation or data files.

### To install required packages, uncomment and run this
# source("http://www.bioconductor.org/biocLite.R")
# biocLite(c("Biobase", "preprocessCore", "genefilter"))
# install.packages(c("blme", "doParallel", "ggplot2", "locfit", "MASS", "plyr", "reshape"))

suppressPackageStartupMessages(library("Biobase"))
suppressPackageStartupMessages(library("blme"))
suppressPackageStartupMessages(library("doParallel"))
suppressPackageStartupMessages(library("genefilter"))
suppressPackageStartupMessages(library("ggplot2"))
suppressPackageStartupMessages(library("locfit"))
suppressPackageStartupMessages(library("MASS"))
suppressPackageStartupMessages(library("plyr"))
suppressPackageStartupMessages(library("preprocessCore"))
suppressPackageStartupMessages(library("reshape"))


source("../../R/data_format_lib.R")
source("../../R/model_lib.R")
source("../../R/simem_lib.R")

 

The screen data formatted as an ExpressionSet, including measurement weights, can be downloaded HERE.

load("../../data/shrna/breast_screens_with_weights.eset")
breast_screens

 

The reagent annotation table we're about to load, including the weight column which will be used to exclude some reagents, can be downloaded HERE.

hp = read.delim("../../data/annotations/hairpin_annotations.txt", header=T, as.is=T, check.names=F)
hpWeights = hp[,c("trcn_id", "gene_id", "weight")]

 

The breast cell line subtypes table detailing the classes we want to compare for differential essentiality can be downloaded HERE.

subtypes = read.delim("../../data/annotations/cell_line_subtypes.txt", header=T, as.is=T, check.names=F)
status = subtypes[,c("cell_line", "subtype_neve")]

 

Once the subtypes table is loaded, we want to reduce the 4 subtype Neve classification into 2 classes: "her2" and "other".

status$erbb2 = ifelse(status$subtype_neve == "her2", "her2", "other")
status[1:10,]

 

Based on the above data frame, two parameters must be specified in the simem() function: covariate = "erbb2" and annotationsPerCellLine = status.

 

By default, the analysis uses the first variable by alphabetical order as the baseline. In this case, we want to determine whether a gene is a context-dependent essential in her2+ cell lines compared to all other subtypes as baseline, so we'll specify the following parameter in the simem() function: covariateFactorOrder = c("other", "her2").

 

For the purposes of this example, to reduce computation time, we'll perform the analysis for known positives highlighted in the manuscript

signalingPathway = c(207,   #AKT1
                    11140, #CDC37
                    2064,  #ERBB2
                    55914, #ERBB2IP
                    2065,  #ERBB3
                    2475,  #MTOR
                    5290,  #PIK3CA
                    6009,  #RHEB
                    25803, #SPDEF
                    7022) #TFAP2C

 

If we want to perform a genome-wide analysis, simply omit the geneIds = signalingPathway parameter. This analysis typically takes ~18 hours, but the computation type can be dramatically reduced by parallelizing the modeling process on multiple processor cores (when available). To use 3 processor cores, for example, specify the parallelNodes = 3 parameter. This option is available on systems with multiple processor cores, whether running Windows, Linux, or Mac OS X. This is done using the doParallel R package.

 

If we specify an analysis using both precision and signal-noise measurement weights (inverseVarianceWeights = TRUE and signalProbWeights = TRUE parameters, respectively), we must ensure that we've added these weights to the ExpressionSet beforehand (DETAILED HERE).

 

Combining all the above, we're ready to perform the differential essentiality analysis.

results = simem(screens = breast_screens,
                geneIds = signalingPathway,
                covariate = "erbb2",
                reagentWeights = hpWeights,
                annotationsPerCellLine = status,
                inverseVarianceWeights = TRUE,
                signalProbWeights = TRUE,
                analyzeReagents = TRUE,
                covariateFactorOrder = c("other", "her2"), 
                parallelNodes = 1
                )

 

For illustration purposes, we'll rerun the above code using multiple (3) parallel processor cores, taking advantage of the capabilities provided by the doParallel package.

results = simem(screens = breast_screens,
                geneIds = signalingPathway,
                covariate = "erbb2",
                reagentWeights = hpWeights,
                annotationsPerCellLine = status,
                inverseVarianceWeights = TRUE,
                signalProbWeights = TRUE,
                analyzeReagents = TRUE,
                covariateFactorOrder = c("other", "her2"), 
                parallelNodes = 3
                )

 

Most users will be interested in the summarized per-gene context-specific essentiality predictions

options(width=100)

results$gene

 

If you want to examine more parameters for each gene, including model fit statistics, std. errors and such, examine the "gene_detailed" data frame.

str(results$gene_detailed)

 

If the analyzeReagents = TRUE parameter was specified, per-reagent context-specific essentiality predictions are also available in both summarized

options(width=100)
reagent = results$reagent
reagent[order(reagent$symbol, reagent$trcn_id), ]

 

and detailed versions.

str(results$reagent_detailed)


neellab/simem documentation built on May 23, 2019, 1:31 p.m.