In darachm/rheocyto: Some FlowCytometry Analysis Tools

"I hear and I forget. I see and I remember. I copy this file and modify it and I have a reproducible record of my analyses."

Confucious

This vignette sets out an example of how to grab some FCS files (envisioned as being generated by the Accuri) into R, gate on it a bit, then make a data.frame suitable for plotting with ggplot.

You should be familiar with your flow cytometer

Sample sheet requirements? + a tab delimited sample sheet in each directory with the following rows:

This vingette recommends a summary of results and quality control steps.

source("src/requireInstall.R")
requireInstall("reshape2")
requireInstall("ggplot2")
requireInstall("flowCore",isBioconductor=T)
requireInstall("flowViz",isBioconductor=T)
requireInstall("Hmisc",isBioconductor=T)
requireInstall("gridExtra")
requireInstall("flowStats",isBioconductor=T)
requireInstall("flowWorkspace",isBioconductor=T)
requireInstall("ggcyto", isBioconductor=T)

Step 2: Read in .fcs files

Read in all data for analysis. Data should be in individual directories that contain .fcs files and a corresponding sample sheet with a generic format. FCS file names need to match the SampleSheet.txt.

Want to make your SampleSheet.txt? Use a GUI spreadsheet thingy, like excel or googledocs or localc or edit() in R. The format is:

| FileName | Ploidy | Dye | ... | |:---|:---|:---|:---| |A01.fcs|1.5ploid|acridineOrange|...| |A02|2.5ploid|acridineOrange|...| |A03.fcs|1.5ploid|sytoselect|...|

Where the only mandatory variable name is FileName, then all the others are whatever you're using.

An abitrary number of directories can be used, put the paths in the dataDirectoryz vector. Here we use relative becuase it should be reproducible between users. You should use ABSOLUTE for your real work. Don't have this folder? Download it (look up).

dataDirectoryz <- c("exampleFlowDataset1")

rm(oneBigFlowSet)
for (dataDirectory in dataDirectoryz) {

  print(paste("Reading",dataDirectory))

  tmpSampleSheet <- read.delim(
    paste0(dataDirectory,"/SampleSheet.txt"),row.names=1)

  tmpFlowSet <- read.flowSet(path=dataDirectory,pattern=".fcs",
    alter.names=TRUE, #why?
    phenoData=new("AnnotatedDataFrame",data=tmpSampleSheet)) 

  sampleNames(tmpFlowSet) <- paste0(dataDirectory,"/",
    sampleNames(tmpFlowSet))
  pData(tmpFlowSet)$name <- sampleNames(tmpFlowSet)

  if (exists("oneBigFlowSet")) {
    oneBigFlowSet <- rbind2(oneBigFlowSet,tmpFlowSet)
  } else {
    oneBigFlowSet <- tmpFlowSet
  }
}
pData(oneBigFlowSet)
print("Counted this many events:")
print(fsApply(oneBigFlowSet,nrow))
print("other statz?")
print(fsApply(oneBigFlowSet,each_col,median))
save(file=paste0("flowSet",digest::digest(oneBigFlowSet),".RData"))

Now you have oneBigFlowSet. Great. For gating, we'll turn that into a data.frame:

source("src/fs2df.R")
datar <- fs2df(oneBigFlowSet)

Step 3: gate

So I can't actually figure out how to make rmarkdown play with interactive chunks. So you should see an error right about....

gateFilename <- paste0("gatesFor",digest::digest(oneBigFlowSet),".RData")
load(gateFilename)

Okay, so now go to your terminal and run this:

source("src/gator.R")
gatez <- list()
gatez[["singlet"]] <- gator2d(datar,ehx="FSC.A",why="FSC.H",
  gateName="singlet") 
gatez[["cells"]] <- gator2d(datar,ehx="FSC.A",why="SSC.A", 
  gateName="cells") 
gatez[["gfpPos"]] <- gator2d(datar,ehx="FSC.A",why="FL1.A",
  gateName="gfpPositive") 
gatez[["gfpNeg"]] <- gator2d(datar,ehx="FSC.A",why="FL1.A", 
  gateName="gfpDepressing") 
save(file=gateFilename,list=c("gatez"))

If you screw it up, delete the RData file of the gates, and re-run.

how they look?

#source("src/gateCheck.R")
#gateCheck()
#make a function so we can re-call it after starting to apply these
for (gateName in names(gatez)) {
  print(gateName)
  ingate  <- data.frame(
    fs2df(Subset(oneBigFlowSet,gatez[[gateName]])),thisGate=T)
  outgate <- data.frame(
    fs2df(Subset(oneBigFlowSet,!gatez[[gateName]])),thisGate=F)
  shuffle <- rbind(ingate,outgate)
  shuffle <- shuffle[sample(1:nrow(shuffle)),]
  names(shuffle)[which(names(shuffle)=="thisGate")] <- gateName
  print(
    ggplot(shuffle)+
      facet_wrap(~name)+
      aes_string(x=colnames(gatez[[gateName]]@boundaries)[1],
                 y=colnames(gatez[[gateName]]@boundaries)[2],
                 col=gateName)+
      geom_point(size=0.2,alpha=0.5)+
      scale_y_log10()+
      scale_x_log10()
  )
}

Subset the data by applying sequential gates

"When you have doublets, do not fear to abandon them."

Confucious

rm(filteredDatar)
for (gateName in c("singlet","cells")) {
  print(gateName)
  if (exists("filteredDatar")) {
    filteredDatar  <- Subset(filteredDatar,gatez[[gateName]])
  } else {
    filteredDatar  <- Subset(oneBigFlowSet,gatez[[gateName]])
  }
}
#gateCheck() call would go here

Buidling a better datastructure, for later use

source("src/splitFSbyGate.R")
gatedFilteredDatar <- filteredDatar
for (gateName in setdiff(names(gatez),c("singlet","cells"))) {
  gatedFilteredDatar <- splitFSbyGate(gatedFilteredDatar,
    gatez[[gateName]])
}
gfdatar <- fs2df(gatedFilteredDatar)

Step 4: Plotz

Into usable form:

head(gfdatar)
mfdatar <- melt(gfdatar,
  id.vars=c("Strain","ul500nMDye","Ploidy","Media","name","gfpPos","gfpNeg"),
  measure.vars=c("FSC.A","FSC.H","FL1.A","FL1.H","FL2.A","FL2.H"))
head(mfdatar)

summary statistics

Many of the gatedFilteredDatar things will be NA's, that's ok. It's because there's not many members in those particular gate/flowframes.

fsApply(filteredDatar,function(x){
  data.frame(t(each_col(x,summary)))})
fsApply(gatedFilteredDatar,function(x){
  data.frame(t(each_col(x,summary)))})

for each sample plot a histogram of the normalized data, raw FSC and raw GFP per row

ggplot(mfdatar)+
  facet_grid(name~variable,scales="free")+
  aes(x=log10(value))+
  geom_histogram(bins=50)

extract data from flowFrames to plot histograms of values and record summary statistics

aggregate(data=mfdatar,value~variable+name,FUN=mean)
aggregate(data=mfdatar,value~variable+name,FUN=median)

Specific plotz

ggplot(subset(mfdatar,variable%in%c("FL2.A","FL2.H")))+
  aes(x=log10(value))+facet_grid(name~variable)+
  geom_histogram()+
  geom_vline(xintercept=4,col="cyan")

ggplot(subset(gfdatar))+
  aes(x=log10(FL2.A/FSC.A))+
  facet_grid(name~.)+
  geom_histogram()+
  geom_vline(xintercept=0,col="cyan")

ggplot(subset(gfdatar))+
  aes(x=log10(FL1.A/FL2.A))+
  facet_grid(name~.)+
  geom_histogram()+
  geom_vline(xintercept=0,col="cyan")

ggplot(subset(gfdatar))+
  aes(x=log10(FSC.H/FL2.A))+
  facet_grid(name~.)+
  geom_histogram()+
  geom_vline(xintercept=0,col="cyan")

ggplot(subset(mfdatar,variable%in%c("FSC.A","FL1.A","FL2.A")))+
  aes(x=name,y=(value))+facet_grid(~variable,scales="free")+
  geom_violin()+scale_y_log10()+
  theme(axis.text.x=element_text(angle=90))

ggplot(subset(mfdatar,variable%in%c("FSC.A","FL1.A","FL2.A")))+
  aes(x=name,y=(value))+facet_grid(~variable,scales="free")+
  geom_boxplot()+scale_y_log10()+
  theme(axis.text.x=element_text(angle=90))

Comparison of FL1.A to 0nM

aggregate(data=gfdatar,FL1.A~ul500nMDye,FUN=median)
medianFL1Anodye <- median(subset(gfdatar,ul500nMDye==0)$FL1.A)
ggplot(gfdatar)+
  aes(x=ul500nMDye,y=FL1.A/medianFL1Anodye,group=name)+
  geom_boxplot()+scale_y_log10()+
  theme(axis.text.x=element_text(angle=90))

Comparison of FSC.A to 0nM

aggregate(data=gfdatar,FSC.A~ul500nMDye,FUN=median)
medianFSCAnodye <- median(subset(gfdatar,ul500nMDye==0)$FSC.A)
ggplot(gfdatar)+
  aes(x=ul500nMDye,y=FSC.A/medianFSCAnodye,group=name)+
  geom_boxplot()+scale_y_log10()+
  theme(axis.text.x=element_text(angle=90))

Population composition

t(apply(
  data.frame(
    aggregate(data=gfdatar,cbind(gfpPos,gfpNeg)~ul500nMDye,
      FUN=function(x){sum(x==T)}),
    aggregate(data=gfdatar,name~ul500nMDye,FUN=length)
   ),1,function(x){return(c(x[1],x[2:3]/x[5]))}))

Step 5: Quality control

“Great Scientist calculates in terms of the Control of the Quality of the Apparatus, not in terms of earning a living. Agriculture is inspired by the fear of hunger; study, by an interest in salary. Great Scientist is concerned about the Apparatus; not about poverty. Yet.”

Confucious

Scroll up to see how the gates did. Below is an example of ggplot().

ggplot(gfdatar)+
  facet_wrap(~ul500nMDye)+
  aes(x=FSC.A,y=FL1.A,col=gfpNeg)+
  geom_point(alpha=0.5,size=0.2)+
  scale_y_log10()+scale_x_log10()

Data transformation for visualization

“Data can make the Transformation great; it isn’t Transformation which makes the Data great.”

Confucious

In order to look at QC plots the data is transformed using the logicle transform, which is a log transform for high values that transitions to a linear transformation near zero values. This is simply for visualization purposes.

the parameters w,t, and m define the transformation parameters Below, dataset 1 tranformation applied to every channel except width and time.

lgcl <- logicleTransform(w = 0.5, t= 10000, m=4.5)
dataLGCLTransform <- transform(gatedFilteredDatar,'FSC.A' = lgcl(`FSC.A`), 'SSC.A' =lgcl(`SSC.A`), 'FL1.A' = lgcl(`FL1.A`), 'FL2.A' = lgcl(`FL2.A`), 'FL3.A' = lgcl(`FL3.A`), 'FL4.A' = lgcl(`FL4.A`),'FSC.H' = lgcl(`FSC.H`),'SSC.H' = lgcl(`SSC.H`),'FL1.H' = lgcl(`FL1.H`),'FL2.H' = lgcl(`FL2.H`),'FL3.H' = lgcl(`FL3.H`),'FL4.H' = lgcl(`FL4.H`))

Effect of time

"Study the past if you would define the future."

Confucious

ggplot(gfdatar)+
  aes(x=Time,y=FL1.A,col=factor(ul500nMDye))+
  geom_point(size=0.2,alpha=0.5)+
  stat_smooth()+scale_y_log10()

Two more plots

ggplot(gfdatar[sample(1:nrow(gfdatar)),])+
  aes(x=FSC.A,y=FL1.A,col=factor(ul500nMDye))+
  geom_point(size=0.2,alpha=0.5)+
  scale_y_log10()+scale_x_log10()

ggplot(gfdatar[sample(1:nrow(gfdatar)),])+
  aes(x=FSC.A,y=SSC.A,col=factor(ul500nMDye))+
  geom_point(size=0.2,alpha=0.5)+
  scale_y_log10()+scale_x_log10()