"I hear and I forget. I see and I remember. I copy this file and modify it and I have a reproducible record of my analyses."
- Confucious
This vignette sets out an example of how to grab some FCS files (envisioned as being generated by the Accuri) into R, gate on it a bit, then make a data.frame suitable for plotting with ggplot.
You should be familiar with your flow cytometer
Sample sheet requirements?
+ a tab delimited sample sheet in each directory with the following rows:
This vingette recommends a summary of results and quality control steps.
source("src/requireInstall.R") requireInstall("reshape2") requireInstall("ggplot2") requireInstall("flowCore",isBioconductor=T) requireInstall("flowViz",isBioconductor=T) requireInstall("Hmisc",isBioconductor=T) requireInstall("gridExtra") requireInstall("flowStats",isBioconductor=T) requireInstall("flowWorkspace",isBioconductor=T) requireInstall("ggcyto", isBioconductor=T)
Read in all data for analysis. Data should be in individual directories that contain .fcs files and a corresponding sample sheet with a generic format.
FCS file names need to match the SampleSheet.txt
.
Want to make your SampleSheet.txt
? Use a GUI spreadsheet thingy,
like excel or googledocs or localc or edit()
in R.
The format is:
| FileName | Ploidy | Dye | ... | |:---|:---|:---|:---| |A01.fcs|1.5ploid|acridineOrange|...| |A02|2.5ploid|acridineOrange|...| |A03.fcs|1.5ploid|sytoselect|...|
Where the only mandatory variable name is FileName
, then all
the others are whatever you're using.
An abitrary number of directories can be used, put the paths in the
dataDirectoryz
vector. Here we use relative becuase it should be
reproducible between users. You should use ABSOLUTE for your real
work. Don't have this folder? Download it (look up).
dataDirectoryz <- c("exampleFlowDataset1") rm(oneBigFlowSet) for (dataDirectory in dataDirectoryz) { print(paste("Reading",dataDirectory)) tmpSampleSheet <- read.delim( paste0(dataDirectory,"/SampleSheet.txt"),row.names=1) tmpFlowSet <- read.flowSet(path=dataDirectory,pattern=".fcs", alter.names=TRUE, #why? phenoData=new("AnnotatedDataFrame",data=tmpSampleSheet)) sampleNames(tmpFlowSet) <- paste0(dataDirectory,"/", sampleNames(tmpFlowSet)) pData(tmpFlowSet)$name <- sampleNames(tmpFlowSet) if (exists("oneBigFlowSet")) { oneBigFlowSet <- rbind2(oneBigFlowSet,tmpFlowSet) } else { oneBigFlowSet <- tmpFlowSet } } pData(oneBigFlowSet) print("Counted this many events:") print(fsApply(oneBigFlowSet,nrow)) print("other statz?") print(fsApply(oneBigFlowSet,each_col,median)) save(file=paste0("flowSet",digest::digest(oneBigFlowSet),".RData"))
Now you have oneBigFlowSet
. Great. For gating, we'll turn that into
a data.frame:
source("src/fs2df.R") datar <- fs2df(oneBigFlowSet)
So I can't actually figure out how to make rmarkdown play with interactive chunks. So you should see an error right about....
gateFilename <- paste0("gatesFor",digest::digest(oneBigFlowSet),".RData") load(gateFilename)
Okay, so now go to your terminal and run this:
source("src/gator.R") gatez <- list() gatez[["singlet"]] <- gator2d(datar,ehx="FSC.A",why="FSC.H", gateName="singlet") gatez[["cells"]] <- gator2d(datar,ehx="FSC.A",why="SSC.A", gateName="cells") gatez[["gfpPos"]] <- gator2d(datar,ehx="FSC.A",why="FL1.A", gateName="gfpPositive") gatez[["gfpNeg"]] <- gator2d(datar,ehx="FSC.A",why="FL1.A", gateName="gfpDepressing") save(file=gateFilename,list=c("gatez"))
If you screw it up, delete the RData file of the gates, and re-run.
how they look?
#source("src/gateCheck.R") #gateCheck() #make a function so we can re-call it after starting to apply these for (gateName in names(gatez)) { print(gateName) ingate <- data.frame( fs2df(Subset(oneBigFlowSet,gatez[[gateName]])),thisGate=T) outgate <- data.frame( fs2df(Subset(oneBigFlowSet,!gatez[[gateName]])),thisGate=F) shuffle <- rbind(ingate,outgate) shuffle <- shuffle[sample(1:nrow(shuffle)),] names(shuffle)[which(names(shuffle)=="thisGate")] <- gateName print( ggplot(shuffle)+ facet_wrap(~name)+ aes_string(x=colnames(gatez[[gateName]]@boundaries)[1], y=colnames(gatez[[gateName]]@boundaries)[2], col=gateName)+ geom_point(size=0.2,alpha=0.5)+ scale_y_log10()+ scale_x_log10() ) }
Subset the data by applying sequential gates
"When you have doublets, do not fear to abandon them."
- Confucious
rm(filteredDatar) for (gateName in c("singlet","cells")) { print(gateName) if (exists("filteredDatar")) { filteredDatar <- Subset(filteredDatar,gatez[[gateName]]) } else { filteredDatar <- Subset(oneBigFlowSet,gatez[[gateName]]) } } #gateCheck() call would go here
Buidling a better datastructure, for later use
source("src/splitFSbyGate.R") gatedFilteredDatar <- filteredDatar for (gateName in setdiff(names(gatez),c("singlet","cells"))) { gatedFilteredDatar <- splitFSbyGate(gatedFilteredDatar, gatez[[gateName]]) } gfdatar <- fs2df(gatedFilteredDatar)
Into usable form:
head(gfdatar) mfdatar <- melt(gfdatar, id.vars=c("Strain","ul500nMDye","Ploidy","Media","name","gfpPos","gfpNeg"), measure.vars=c("FSC.A","FSC.H","FL1.A","FL1.H","FL2.A","FL2.H")) head(mfdatar)
Many of the gatedFilteredDatar
things will be NA
's, that's ok.
It's because there's not many members in those particular
gate/flowframes.
fsApply(filteredDatar,function(x){ data.frame(t(each_col(x,summary)))}) fsApply(gatedFilteredDatar,function(x){ data.frame(t(each_col(x,summary)))})
ggplot(mfdatar)+ facet_grid(name~variable,scales="free")+ aes(x=log10(value))+ geom_histogram(bins=50)
aggregate(data=mfdatar,value~variable+name,FUN=mean) aggregate(data=mfdatar,value~variable+name,FUN=median)
Specific plotz
ggplot(subset(mfdatar,variable%in%c("FL2.A","FL2.H")))+ aes(x=log10(value))+facet_grid(name~variable)+ geom_histogram()+ geom_vline(xintercept=4,col="cyan") ggplot(subset(gfdatar))+ aes(x=log10(FL2.A/FSC.A))+ facet_grid(name~.)+ geom_histogram()+ geom_vline(xintercept=0,col="cyan") ggplot(subset(gfdatar))+ aes(x=log10(FL1.A/FL2.A))+ facet_grid(name~.)+ geom_histogram()+ geom_vline(xintercept=0,col="cyan") ggplot(subset(gfdatar))+ aes(x=log10(FSC.H/FL2.A))+ facet_grid(name~.)+ geom_histogram()+ geom_vline(xintercept=0,col="cyan") ggplot(subset(mfdatar,variable%in%c("FSC.A","FL1.A","FL2.A")))+ aes(x=name,y=(value))+facet_grid(~variable,scales="free")+ geom_violin()+scale_y_log10()+ theme(axis.text.x=element_text(angle=90)) ggplot(subset(mfdatar,variable%in%c("FSC.A","FL1.A","FL2.A")))+ aes(x=name,y=(value))+facet_grid(~variable,scales="free")+ geom_boxplot()+scale_y_log10()+ theme(axis.text.x=element_text(angle=90))
Comparison of FL1.A to 0nM
aggregate(data=gfdatar,FL1.A~ul500nMDye,FUN=median) medianFL1Anodye <- median(subset(gfdatar,ul500nMDye==0)$FL1.A) ggplot(gfdatar)+ aes(x=ul500nMDye,y=FL1.A/medianFL1Anodye,group=name)+ geom_boxplot()+scale_y_log10()+ theme(axis.text.x=element_text(angle=90))
Comparison of FSC.A to 0nM
aggregate(data=gfdatar,FSC.A~ul500nMDye,FUN=median) medianFSCAnodye <- median(subset(gfdatar,ul500nMDye==0)$FSC.A) ggplot(gfdatar)+ aes(x=ul500nMDye,y=FSC.A/medianFSCAnodye,group=name)+ geom_boxplot()+scale_y_log10()+ theme(axis.text.x=element_text(angle=90))
Population composition
t(apply( data.frame( aggregate(data=gfdatar,cbind(gfpPos,gfpNeg)~ul500nMDye, FUN=function(x){sum(x==T)}), aggregate(data=gfdatar,name~ul500nMDye,FUN=length) ),1,function(x){return(c(x[1],x[2:3]/x[5]))}))
Step 5: Quality control
“Great Scientist calculates in terms of the Control of the Quality of the Apparatus, not in terms of earning a living. Agriculture is inspired by the fear of hunger; study, by an interest in salary. Great Scientist is concerned about the Apparatus; not about poverty. Yet.”
- Confucious
Scroll up to see how the gates did. Below is an example of ggplot()
.
ggplot(gfdatar)+ facet_wrap(~ul500nMDye)+ aes(x=FSC.A,y=FL1.A,col=gfpNeg)+ geom_point(alpha=0.5,size=0.2)+ scale_y_log10()+scale_x_log10()
Data transformation for visualization
“Data can make the Transformation great; it isn’t Transformation which makes the Data great.”
- Confucious
In order to look at QC plots the data is transformed using the logicle transform, which is a log transform for high values that transitions to a linear transformation near zero values. This is simply for visualization purposes.
the parameters w,t, and m define the transformation parameters Below, dataset 1 tranformation applied to every channel except width and time.
lgcl <- logicleTransform(w = 0.5, t= 10000, m=4.5) dataLGCLTransform <- transform(gatedFilteredDatar,'FSC.A' = lgcl(`FSC.A`), 'SSC.A' =lgcl(`SSC.A`), 'FL1.A' = lgcl(`FL1.A`), 'FL2.A' = lgcl(`FL2.A`), 'FL3.A' = lgcl(`FL3.A`), 'FL4.A' = lgcl(`FL4.A`),'FSC.H' = lgcl(`FSC.H`),'SSC.H' = lgcl(`SSC.H`),'FL1.H' = lgcl(`FL1.H`),'FL2.H' = lgcl(`FL2.H`),'FL3.H' = lgcl(`FL3.H`),'FL4.H' = lgcl(`FL4.H`))
Effect of time
"Study the past if you would define the future."
- Confucious
ggplot(gfdatar)+ aes(x=Time,y=FL1.A,col=factor(ul500nMDye))+ geom_point(size=0.2,alpha=0.5)+ stat_smooth()+scale_y_log10()
Two more plots
ggplot(gfdatar[sample(1:nrow(gfdatar)),])+ aes(x=FSC.A,y=FL1.A,col=factor(ul500nMDye))+ geom_point(size=0.2,alpha=0.5)+ scale_y_log10()+scale_x_log10() ggplot(gfdatar[sample(1:nrow(gfdatar)),])+ aes(x=FSC.A,y=SSC.A,col=factor(ul500nMDye))+ geom_point(size=0.2,alpha=0.5)+ scale_y_log10()+scale_x_log10()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.