DOWNLOAD R CODE FILE FOR THIS TUTORIAL
First DOWNLOAD SIMEM R CODE zip bundle containing the R code files loaded below.
Also note that, while the code below uses relative paths, you should edit these to point to where you've stored the code, annotation or data files.
### To install required packages, uncomment and run this # source("http://www.bioconductor.org/biocLite.R") # biocLite(c("Biobase", "preprocessCore", "genefilter")) # install.packages(c("blme", "doParallel", "ggplot2", "locfit", "MASS", "plyr", "reshape")) suppressPackageStartupMessages(library("Biobase")) suppressPackageStartupMessages(library("blme")) suppressPackageStartupMessages(library("doParallel")) suppressPackageStartupMessages(library("genefilter")) suppressPackageStartupMessages(library("ggplot2")) suppressPackageStartupMessages(library("locfit")) suppressPackageStartupMessages(library("MASS")) suppressPackageStartupMessages(library("plyr")) suppressPackageStartupMessages(library("preprocessCore")) suppressPackageStartupMessages(library("reshape")) source("../../R/data_format_lib.R") source("../../R/model_lib.R") source("../../R/simem_lib.R")
The screen data formatted as an ExpressionSet
, including measurement weights, can be downloaded HERE.
load("../../data/shrna/breast_screens_with_weights.eset") breast_screens
The reagent annotation table we're about to load, including the weight
column which will be used to exclude some reagents, can be downloaded HERE.
hp = read.delim("../../data/annotations/hairpin_annotations.txt", header=T, as.is=T, check.names=F) hpWeights = hp[,c("trcn_id", "gene_id", "weight")]
The breast cell line subtypes table detailing the classes we want to compare for differential essentiality can be downloaded HERE.
subtypes = read.delim("../../data/annotations/cell_line_subtypes.txt", header=T, as.is=T, check.names=F) status = subtypes[,c("cell_line", "subtype_neve")]
Once the subtypes table is loaded, we want to reduce the 4 subtype Neve classification into 2 classes: "her2" and "other".
status$erbb2 = ifelse(status$subtype_neve == "her2", "her2", "other") status[1:10,]
Based on the above data frame, two parameters must be specified in the simem()
function: covariate = "erbb2"
and annotationsPerCellLine = status
.
By default, the analysis uses the first variable by alphabetical order as the baseline. In this case, we want to determine whether a gene is a context-dependent essential in her2+ cell lines compared to all other subtypes as baseline, so we'll specify the following parameter in the simem()
function: covariateFactorOrder = c("other", "her2")
.
For the purposes of this example, to reduce computation time, we'll perform the analysis for known positives highlighted in the manuscript
signalingPathway = c(207, #AKT1 11140, #CDC37 2064, #ERBB2 55914, #ERBB2IP 2065, #ERBB3 2475, #MTOR 5290, #PIK3CA 6009, #RHEB 25803, #SPDEF 7022) #TFAP2C
If we want to perform a genome-wide analysis, simply omit the geneIds = signalingPathway
parameter. This analysis typically takes ~18 hours, but the computation type can be dramatically reduced by parallelizing the modeling process on multiple processor cores (when available). To use 3 processor cores, for example, specify the parallelNodes = 3
parameter. This option is available on systems with multiple processor cores, whether running Windows, Linux, or Mac OS X. This is done using the doParallel
R package.
If we specify an analysis using both precision and signal-noise measurement weights (inverseVarianceWeights = TRUE
and signalProbWeights = TRUE
parameters, respectively), we must ensure that we've added these weights to the ExpressionSet
beforehand (DETAILED HERE).
Combining all the above, we're ready to perform the differential essentiality analysis.
results = simem(screens = breast_screens, geneIds = signalingPathway, covariate = "erbb2", reagentWeights = hpWeights, annotationsPerCellLine = status, inverseVarianceWeights = TRUE, signalProbWeights = TRUE, analyzeReagents = TRUE, covariateFactorOrder = c("other", "her2"), parallelNodes = 1 )
For illustration purposes, we'll rerun the above code using multiple (3) parallel processor cores, taking advantage of the capabilities provided by the doParallel
package.
results = simem(screens = breast_screens, geneIds = signalingPathway, covariate = "erbb2", reagentWeights = hpWeights, annotationsPerCellLine = status, inverseVarianceWeights = TRUE, signalProbWeights = TRUE, analyzeReagents = TRUE, covariateFactorOrder = c("other", "her2"), parallelNodes = 3 )
Most users will be interested in the summarized per-gene context-specific essentiality predictions
options(width=100) results$gene
If you want to examine more parameters for each gene, including model fit statistics, std. errors and such, examine the "gene_detailed" data frame.
str(results$gene_detailed)
If the analyzeReagents = TRUE
parameter was specified, per-reagent context-specific essentiality predictions are also available in both summarized
options(width=100) reagent = results$reagent reagent[order(reagent$symbol, reagent$trcn_id), ]
and detailed versions.
str(results$reagent_detailed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.