knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

This vignette demonstrates how to analyze cell-profiler acquired data of 96-well plates, treated with different conditions.

Update

To ensure that the latest package version is installed, we will fetch it from github.

# TODO: Test for devtools
if (!"devtools" %in% installed.packages()) {
    install.packages("devtools")
}
devtools::install_github("mknoll/cmoRe")
require(cmoRe)

Data requirements

This analysis pipeline was built to analyze images obtained with an InCell System, and processed using CellProfiler.

A typical experiment might be structured as follows:

96 Well plate in which cells (from one experimental run/batch) are seeded. Treatment with n different conditions, 3 replicates (=wells) per condition. Three repetitions of the experiment. Staining of nuclei with DAPI, cytoplasmatic staining with e.g. Desmin for NRCMs.

For each run, three files created by CellProfiler and a treatment file are needed.

The CellProfiler files are named as follows: Cells.txt: Total cell measurements Cytoplasm.txt: Cytoplasm measurements (without nuclei) Primarieswithoutborder.txt: Nuclear measurements (using nuclei as seeds).

When merging files (image number, object number), suffixes .nucl, .cyto, .cell are used for equal CellProfiler feature names.

read.csv(system.file("extdata", "run1_1/Cells.txt", package = "cmoRe"), sep="\t")[1:3,1:5]

read.csv(system.file("extdata", "run1_1/Cytoplasm.txt", package = "cmoRe"), sep="\t")[1:3,1:5]

read.csv(system.file("extdata", "run1_1/Primarieswithoutborder.txt", package = "cmoRe"), sep="\t")[1:3,1:5]

Additionally, a treatment file is required: Treatment.csv containing at least two columns, "well" and "Treatment". Further columns might be supplied, e.g. a "Konzentration" Column.

read.csv(system.file("extdata", "run1_1/Treatment.csv", package = "cmoRe"))[1:5,]

All files should be stored in a single folder for each experimental run.

Data structure

Multiple 96 well plates might be part of an experiment. Each experimental run is encoded as list element, again containing a list of all folders wich are part of the respective run.

A design with 2 independent runs with 2 plates each might look as follows:

data <- list()
data[[length(data)+1]] <- list(system.file("extdata", "run1_1/", package = "cmoRe"),
                               system.file("extdata", "run1_2/", package = "cmoRe"))
data[[length(data)+1]] <- list(system.file("extdata", "run2_1/", package = "cmoRe"),
                               system.file("extdata", "run2_2/", package = "cmoRe"))

Experiment ID

Each experiment is associated with an identifier.

uid <- "testexp"

Create experiment

Now, we can create the imageExp instance, in which the data will be stored.

exp <- new("imageExp", uid, data)

The next step is loading of the data. If only a subset of variables should be loaded, a vars parameter might be supplied. In this case, we only use two of the cell profiler output files and have to specify this, default is three files:

In third file is

exp <- getData(exp, 
           files=c("Treatment.csv", "Cells.txt", "Primarieswithoutborder.txt"))

#only read a single variable
#exp <- getData(exp, 
#          files=c("Treatment.csv", "Cells.txt", "Primarieswithoutborder.txt"),
#       vars="AreaShape_Area")

#read all three files
#exp <- getData(exp)

QC

First, we check if all expected data (as specified in the Treatment file) is correctly loaded.

exp <- checkFilter(exp, filter=NULL)

To get a first overview of the data, all variables and the number of cells per well can be plotted side by side to a treatment plan. Results are stored as pdf file, an output folder and filename can be specified.

# number of cells
exp <- qcPlots(exp, folder="/tmp/") 
exp <- qcPlots(exp) 

# Cell-Size
exp <- qcPlots(exp, vars=c("AreaShape_Area.cell"), folder="/tmp/")

Cell-cycle and Cell/Nuclear Ratio

Nuclear DNA intensity and cellular/nuclear size ratio can be used to determine cell cycle state for each cell and to filter dead cells.

Cutoff respecting certain contraints are calculated with the calc() function. The parameter fun="cc" will calculate cutoffs for cell cycle, cellular/nuclear ratio with fun="nc". Not specifying the fun parameter will calculate both and a fibroblast filter (see below).

This step might take some time, depending on the number of bootstraps used to obtain cutoffs and the amount of data.

A faster alternative (which does not require to load all data simulateneously) is described below.

exp <- calc(exp, fun="nc")

# check assignments 
table(exp@data$ncArea)

# calculate cell cycle cutoffs
#exp <- calc(exp, fun="cc")  #data not included 

Fibroblast identification

calc() also provides the ability to idnetify cutoffs used for fibroblast identification

# calculate fibroblast cutoffs
# exp <- calc(exp, fun="fb") #data not included

Alternative method for cutoff calculation

As larger experiments might require quite a lot of RAM - cutoffs can also be calculated in a first step by reading only the data needed for the calculations without having to keep all data in memory.

It is sufficient to create an imageExp object with the data structure as described above (lists with entries per experiment / plate).

exp0 <- new("imageExp", "fast_cutoff_calc", data)

#calcualte cutoffs without loading all data
cutoffs <- calcCutoffs(exp0, fun="nc")

#now we load the data
exp0 <- getData(exp0)

#and use the precalculated cutoffs
exp0 <- addCutoffs(exp0, cutoffs)
table(exp0@data$ncArea)

#get cell cycle fractions 
#cc <- sumAgg(exp0) #data not included 

Cell cycle

We can plot cell cycles per plate or aggregated per run to evalaute quaality. Data is not included in the supplied data.

## if we use calc() (slower) to calcuate cutoffs:
#exp <- calc(exp, fun="cc")
analyzeCC(exp)

##per plate
#analyzeCC(exp, aggregate=T)

## if we use calcCutoffs() and addCutoffs() 
cc <- sumAgg(exp0)

Filter data

Next, data is filtered by the previously calculated cutoffs to remove dead cells.

exp <- filter(exp)

### and we can decide to keep only wells with at least 1000 cells
exp <- checkFilter(exp, filter=1000)

Additional variables (diff, ratio)

Differences and ratios between Cell/Nuclear variables can be easily added using the addVars function.

exp <- addVars(exp, c("AreaShape_Area"))
colnames(exp@data)[which(grepl("AreaShape", colnames(exp@data)))]

Compute cutoffs for additional features

We can try to split features based on their distribution.

vars <- "RATIO_NC_AreaShape_Area"
exp<- addCutoffVars(exp, vars=vars, fun=log)
head(exp@data)

Aggregation per well

For the following analyses, all measurements are aggregated per well (default: median), might be specified by the fun argument.

exp@dataAgg <- medianAgg(exp)

## calculate fractions of generated binary variables 
exp@dataAgg <- cbind(exp@dataAgg, fractAgg(exp))

Remove raw data

As exp object might become quite large, we only want to store derived data. Therefore, we remove the raw data and can then save the resulting exp object. Not performed here, as we need the raw data for single cell analysis (see below).

#exp <- removeRawData(exp)

Z-transform

We can normalize the data by z-transforming data per run. Similar treated wells are required in all runs.

exp <- zTrans(exp)

Extract data

#non-transformed data
exp@dataAgg[1:3,]
#z-transformed data
exp@dataZ[1:3,]

Analyze I

Differentially regulated variables

dat <- exp@dataZ
tmp <- do.call(rbind, strsplit(as.character(dat$TREATMENT), "_"))

dat$GRP <- tmp[,1]
## ggf log transformieren
## ggf zentrieren der dosis 
dat$DOSE <- tmp[,2] 
dat$DOSE[which(tmp[,2] == 1)] <- 3
dat$DOSE[which(tmp[,2] == 2)] <- 2
dat$DOSE[which(tmp[,2] == 3)] <- 1
dat$DOSE[which(tmp[,1] == "Ko")] <- 0

table(dat$DOSE, dat$VERSUCH, dat$GRP)

###lme4
frm <- as.formula(VAL~DOSE+(1|VERSUCH/PLATTE/TREATMENT)) 
frm0 <- as.formula(VAL~1+(1|VERSUCH/PLATTE/TREATMENT))  
res <- analyze(dat, ref="Ko_1", frm=frm, frm0=frm0)

###nlme
#frm <- as.formula(VAL~DOSE) 
#frm0 <- as.formula(VAL~1)
#rand <- as.formula(~1|VERSUCH/PLATTE/TREATMENT)
#res <- analyzeNLME(dat, ref="Ko_1", frm=frm, frm0=frm0, rand=rand, level=0)

Filter results

Keep only models which are significantly better compared to the respective null model, e.g. after Benjamini-Hochberg p-value adjustment

lapply(res, function(x) { 
           ## add p.value, usign a z-distribution 
           x$p.value <- 2*(1-pnorm(abs(x$t.value)))
           ## remove all vars which have an R2cv < 0.9
           x <- x[which(x$R2cv > 0.9),]
           ## selektiere einzelne modelle
           tmp <- x[which(!duplicated(x$VAR)),]
           tmp$padj <- p.adjust(tmp$aP, "BH")
           x$padj <- tmp$padj[match(x$VAR, tmp$VAR)]
           ## we apply only weak filters here
           x <- x[which(x$padj < 0.2),]
           x <- x[which(x$p.value < 0.2),]
           x
    })

Remove unwanted variation

require(lme4)
frm <- as.formula(VAL~1+(1|VERSUCH/PLATTE/TREATMENT)) 
dat <- exp@dataZ
dat <- dat[,which(!apply(dat, 2, function(x) all(is.na(x))))]

#remove metadata
cn <- colnames(dat[,-which(colnames(dat) %in% c("TREATMENT", "VERSUCH", "PLATTE","WELL"))])

coll <- list()
for (i in 1:length(cn)) {
    dat$VAL <-dat[,which(colnames(dat) == cn)[i]]
    fit <- lmer(frm, data=dat[,])
    coll[[length(coll)+1]] <- predict(fit)-dat$VAL
    names(coll)[length(coll)] <- cn[i]
}
coll <- do.call(cbind, coll)

require(pheatmap)
image(t(dat[,cn]))
image(t(coll[,cn]))

Identify clusters

General approach for cluster identification, usually using prefiltered data (see above). For demonstration purposes, all data is used. For demonstration, z transformed data is used.

require(ggplot2)
################
### select data ##
##################
#dat <- exp@dataZ[,selVars]
dat <- exp@dataZ
#remove metadata
dat <- dat[,-which(colnames(dat) %in% c("TREATMENT", "VERSUCH", "PLATTE","WELL"))]
###############
#remove only NA cols
dat <- dat[,-which(apply(dat, 2, function(x) all(is.na(x))))]

######################
###### calculate cluster variability
######################
cluster <- findClusters(dat, nClust = c(1,50))

df <- data.frame(apply(cluster[,1:2], 2, unlist))
cutoff <- 2
ggplot(df, aes(y=dispSum, x=i))+ geom_point() + geom_line() + labs(x="#clusters", y="variability") + geom_hline(yintercept=cutoff)

#####################
## Calculate synthetic variables
######################
cm <- cluster
cm <- cm[order(unlist(cm[,"dispSum"])),]
map <- cm[which(cm[,"dispSum"] > cutoff),,drop=F][1,,drop=F][1,"pm.clust"][[1]]
data <- calculateSyntheticVars(dat, map)


###################
## Radarplots #####
##################
synthMetr <- calculateSyntheticVars(dat, map)
synthMetrDat <- cbind(synthMetr$data, exp@dataZ)

col <- list(c(rgb(0.1,0.1,0.1,0.5), rgb(0.1,0.5,0.5,0.5)), 
        c(rgb(0.1,0.1,0.1,0.5), rgb(0.9,0.4,0.4,0.5)), 
        c(rgb(0.1,0.1,0.1,0.5), rgb(0.9,0.9,0.5,0.5)),
        c(rgb(0.1,0.1,0.1,0.5), rgb(0,100/255,0,0.5)))
par(mfrow=c(1,3), mar=c(0,0,2,0))
drawRadarplots(synthMetrDat, vars=names(synthMetr$anno), labels = F, ctrlLevel="Ko_1", pTest="t.test", agg=median,col=col) 


mknoll/cmoRe documentation built on Nov. 18, 2022, 4:01 p.m.