knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette demonstrates how to analyze cell-profiler acquired data of 96-well plates, treated with different conditions.
To ensure that the latest package version is installed, we will fetch it from github.
# TODO: Test for devtools if (!"devtools" %in% installed.packages()) { install.packages("devtools") } devtools::install_github("mknoll/cmoRe")
require(cmoRe)
This analysis pipeline was built to analyze images obtained with an InCell System, and processed using CellProfiler.
A typical experiment might be structured as follows:
96 Well plate in which cells (from one experimental run/batch) are seeded. Treatment with n different conditions, 3 replicates (=wells) per condition. Three repetitions of the experiment. Staining of nuclei with DAPI, cytoplasmatic staining with e.g. Desmin for NRCMs.
For each run, three files created by CellProfiler and a treatment file are needed.
The CellProfiler files are named as follows: Cells.txt: Total cell measurements Cytoplasm.txt: Cytoplasm measurements (without nuclei) Primarieswithoutborder.txt: Nuclear measurements (using nuclei as seeds).
When merging files (image number, object number), suffixes .nucl, .cyto, .cell are used for equal CellProfiler feature names.
read.csv(system.file("extdata", "run1_1/Cells.txt", package = "cmoRe"), sep="\t")[1:3,1:5] read.csv(system.file("extdata", "run1_1/Cytoplasm.txt", package = "cmoRe"), sep="\t")[1:3,1:5] read.csv(system.file("extdata", "run1_1/Primarieswithoutborder.txt", package = "cmoRe"), sep="\t")[1:3,1:5]
Additionally, a treatment file is required: Treatment.csv containing at least two columns, "well" and "Treatment". Further columns might be supplied, e.g. a "Konzentration" Column.
read.csv(system.file("extdata", "run1_1/Treatment.csv", package = "cmoRe"))[1:5,]
All files should be stored in a single folder for each experimental run.
Multiple 96 well plates might be part of an experiment. Each experimental run is encoded as list element, again containing a list of all folders wich are part of the respective run.
A design with 2 independent runs with 2 plates each might look as follows:
data <- list() data[[length(data)+1]] <- list(system.file("extdata", "run1_1/", package = "cmoRe"), system.file("extdata", "run1_2/", package = "cmoRe")) data[[length(data)+1]] <- list(system.file("extdata", "run2_1/", package = "cmoRe"), system.file("extdata", "run2_2/", package = "cmoRe"))
Each experiment is associated with an identifier.
uid <- "testexp"
Now, we can create the imageExp instance, in which the data will be stored.
exp <- new("imageExp", uid, data)
The next step is loading of the data. If only a subset of variables should be loaded, a vars parameter might be supplied. In this case, we only use two of the cell profiler output files and have to specify this, default is three files:
In third file is
exp <- getData(exp, files=c("Treatment.csv", "Cells.txt", "Primarieswithoutborder.txt")) #only read a single variable #exp <- getData(exp, # files=c("Treatment.csv", "Cells.txt", "Primarieswithoutborder.txt"), # vars="AreaShape_Area") #read all three files #exp <- getData(exp)
First, we check if all expected data (as specified in the Treatment file) is correctly loaded.
exp <- checkFilter(exp, filter=NULL)
To get a first overview of the data, all variables and the number of cells per well can be plotted side by side to a treatment plan. Results are stored as pdf file, an output folder and filename can be specified.
# number of cells exp <- qcPlots(exp, folder="/tmp/") exp <- qcPlots(exp) # Cell-Size exp <- qcPlots(exp, vars=c("AreaShape_Area.cell"), folder="/tmp/")
Nuclear DNA intensity and cellular/nuclear size ratio can be used to determine cell cycle state for each cell and to filter dead cells.
Cutoff respecting certain contraints are calculated with the calc() function. The parameter fun="cc" will calculate cutoffs for cell cycle, cellular/nuclear ratio with fun="nc". Not specifying the fun parameter will calculate both and a fibroblast filter (see below).
This step might take some time, depending on the number of bootstraps used to obtain cutoffs and the amount of data.
A faster alternative (which does not require to load all data simulateneously) is described below.
exp <- calc(exp, fun="nc") # check assignments table(exp@data$ncArea) # calculate cell cycle cutoffs #exp <- calc(exp, fun="cc") #data not included
calc() also provides the ability to idnetify cutoffs used for fibroblast identification
# calculate fibroblast cutoffs # exp <- calc(exp, fun="fb") #data not included
As larger experiments might require quite a lot of RAM - cutoffs can also be calculated in a first step by reading only the data needed for the calculations without having to keep all data in memory.
It is sufficient to create an imageExp object with the data structure as described above (lists with entries per experiment / plate).
exp0 <- new("imageExp", "fast_cutoff_calc", data) #calcualte cutoffs without loading all data cutoffs <- calcCutoffs(exp0, fun="nc") #now we load the data exp0 <- getData(exp0) #and use the precalculated cutoffs exp0 <- addCutoffs(exp0, cutoffs) table(exp0@data$ncArea) #get cell cycle fractions #cc <- sumAgg(exp0) #data not included
We can plot cell cycles per plate or aggregated per run to evalaute quaality. Data is not included in the supplied data.
## if we use calc() (slower) to calcuate cutoffs: #exp <- calc(exp, fun="cc") analyzeCC(exp) ##per plate #analyzeCC(exp, aggregate=T) ## if we use calcCutoffs() and addCutoffs() cc <- sumAgg(exp0)
Next, data is filtered by the previously calculated cutoffs to remove dead cells.
exp <- filter(exp) ### and we can decide to keep only wells with at least 1000 cells exp <- checkFilter(exp, filter=1000)
Differences and ratios between Cell/Nuclear variables can be easily added using the addVars function.
exp <- addVars(exp, c("AreaShape_Area")) colnames(exp@data)[which(grepl("AreaShape", colnames(exp@data)))]
We can try to split features based on their distribution.
vars <- "RATIO_NC_AreaShape_Area" exp<- addCutoffVars(exp, vars=vars, fun=log) head(exp@data)
For the following analyses, all measurements are aggregated per well (default: median), might be specified by the fun argument.
exp@dataAgg <- medianAgg(exp) ## calculate fractions of generated binary variables exp@dataAgg <- cbind(exp@dataAgg, fractAgg(exp))
As exp object might become quite large, we only want to store derived data. Therefore, we remove the raw data and can then save the resulting exp object. Not performed here, as we need the raw data for single cell analysis (see below).
#exp <- removeRawData(exp)
We can normalize the data by z-transforming data per run. Similar treated wells are required in all runs.
exp <- zTrans(exp)
#non-transformed data exp@dataAgg[1:3,] #z-transformed data exp@dataZ[1:3,]
Differentially regulated variables
dat <- exp@dataZ tmp <- do.call(rbind, strsplit(as.character(dat$TREATMENT), "_")) dat$GRP <- tmp[,1] ## ggf log transformieren ## ggf zentrieren der dosis dat$DOSE <- tmp[,2] dat$DOSE[which(tmp[,2] == 1)] <- 3 dat$DOSE[which(tmp[,2] == 2)] <- 2 dat$DOSE[which(tmp[,2] == 3)] <- 1 dat$DOSE[which(tmp[,1] == "Ko")] <- 0 table(dat$DOSE, dat$VERSUCH, dat$GRP) ###lme4 frm <- as.formula(VAL~DOSE+(1|VERSUCH/PLATTE/TREATMENT)) frm0 <- as.formula(VAL~1+(1|VERSUCH/PLATTE/TREATMENT)) res <- analyze(dat, ref="Ko_1", frm=frm, frm0=frm0) ###nlme #frm <- as.formula(VAL~DOSE) #frm0 <- as.formula(VAL~1) #rand <- as.formula(~1|VERSUCH/PLATTE/TREATMENT) #res <- analyzeNLME(dat, ref="Ko_1", frm=frm, frm0=frm0, rand=rand, level=0)
Keep only models which are significantly better compared to the respective null model, e.g. after Benjamini-Hochberg p-value adjustment
lapply(res, function(x) { ## add p.value, usign a z-distribution x$p.value <- 2*(1-pnorm(abs(x$t.value))) ## remove all vars which have an R2cv < 0.9 x <- x[which(x$R2cv > 0.9),] ## selektiere einzelne modelle tmp <- x[which(!duplicated(x$VAR)),] tmp$padj <- p.adjust(tmp$aP, "BH") x$padj <- tmp$padj[match(x$VAR, tmp$VAR)] ## we apply only weak filters here x <- x[which(x$padj < 0.2),] x <- x[which(x$p.value < 0.2),] x })
require(lme4) frm <- as.formula(VAL~1+(1|VERSUCH/PLATTE/TREATMENT)) dat <- exp@dataZ dat <- dat[,which(!apply(dat, 2, function(x) all(is.na(x))))] #remove metadata cn <- colnames(dat[,-which(colnames(dat) %in% c("TREATMENT", "VERSUCH", "PLATTE","WELL"))]) coll <- list() for (i in 1:length(cn)) { dat$VAL <-dat[,which(colnames(dat) == cn)[i]] fit <- lmer(frm, data=dat[,]) coll[[length(coll)+1]] <- predict(fit)-dat$VAL names(coll)[length(coll)] <- cn[i] } coll <- do.call(cbind, coll) require(pheatmap) image(t(dat[,cn])) image(t(coll[,cn]))
General approach for cluster identification, usually using prefiltered data (see above). For demonstration purposes, all data is used. For demonstration, z transformed data is used.
require(ggplot2) ################ ### select data ## ################## #dat <- exp@dataZ[,selVars] dat <- exp@dataZ #remove metadata dat <- dat[,-which(colnames(dat) %in% c("TREATMENT", "VERSUCH", "PLATTE","WELL"))] ############### #remove only NA cols dat <- dat[,-which(apply(dat, 2, function(x) all(is.na(x))))] ###################### ###### calculate cluster variability ###################### cluster <- findClusters(dat, nClust = c(1,50)) df <- data.frame(apply(cluster[,1:2], 2, unlist)) cutoff <- 2 ggplot(df, aes(y=dispSum, x=i))+ geom_point() + geom_line() + labs(x="#clusters", y="variability") + geom_hline(yintercept=cutoff) ##################### ## Calculate synthetic variables ###################### cm <- cluster cm <- cm[order(unlist(cm[,"dispSum"])),] map <- cm[which(cm[,"dispSum"] > cutoff),,drop=F][1,,drop=F][1,"pm.clust"][[1]] data <- calculateSyntheticVars(dat, map) ################### ## Radarplots ##### ################## synthMetr <- calculateSyntheticVars(dat, map) synthMetrDat <- cbind(synthMetr$data, exp@dataZ) col <- list(c(rgb(0.1,0.1,0.1,0.5), rgb(0.1,0.5,0.5,0.5)), c(rgb(0.1,0.1,0.1,0.5), rgb(0.9,0.4,0.4,0.5)), c(rgb(0.1,0.1,0.1,0.5), rgb(0.9,0.9,0.5,0.5)), c(rgb(0.1,0.1,0.1,0.5), rgb(0,100/255,0,0.5))) par(mfrow=c(1,3), mar=c(0,0,2,0)) drawRadarplots(synthMetrDat, vars=names(synthMetr$anno), labels = F, ctrlLevel="Ko_1", pTest="t.test", agg=median,col=col)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.