Alex J H Fedorec 02/06/2021
We have attempted to make our software usable with minimal prior knowledge of the R programming language or programming in general. You will need to be familiar with the idea of running commands from a console or writing basic scripts. For R beginners, this is a great starting point, there are some good resources here and we suggest using the RStudio application. It provides an environment for writing and running R code.
FlopR relies on several other R packages. Most of them are available through the “Comprehensive R Archive Network (CRAN)” which just means that they can be automatically installed. There are, however, a couple that need to be manually installed if you are using the flow cytometry processing functions. To do this use the following commands:
install.packages("devtools", repos = "https://cloud.r-project.org/")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("flowCore", "flowClust", "flowStats"))
Once those packages have been installed we are ready to install FlopR.
devtools::install_github("ucl-cssb/flopr")
This grabs the latest version of FlopR from GitHub and builds it on your computer. If it installed successfully we should now be able to load the FlopR package in R.
library(flopr)
The examples in this document can be run using the data in the “example”
folder. Download the whole folder and make sure that you have set your
current working directory (you can use the setwd()
command in R) to
the location where you saved it on your computer
This package makes extensive use of .csv (comma separated variables) files for data and meta-data. Unfortunately, not all .csv are created equal.
The experimental protocols for producing absorbance and fluorescence calibration data can be found here and here.
The data produced by plate readers from different manufacturers comes in different formats. The first step we need to do is to “parse” the data from the plate reader into a standard format. We have written an example parsers for use with data from Tecan plate readers and the Biotek Neo 2. If you need help writing a parser for your plate reader, please contact us and we’ll see what we can do.
For a Tecan plate reader, the data is saved as an Excel .xls file. As of flopR version 0.4.02, the spark_parse function accepts Excel files (.xls and .xlsx). If you are using an earlier version, the data first needs to be saved as a .csv file (open in Excel and “Save As” .csv) that can be read by R. We also need a .csv file telling us what is in each well of our microtitre plate. An example can be found in the “examples/plate_reader/tecan_spark” folder, but the first few rows looks like this:
Once we have the calibration data and layout .csv files we can parse the data.
flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane.csv",
layout_csv = "examples/plate_reader/tecan_spark/calibration_plate_layout.csv",
timeseries = FALSE)
The data_csv
argument takes the path to the calibration data.
layout_csv
is the path to the plate layout .csv file. Finally, the
Tecan plate readers save timeseries data differently from single
timepoint data, so we have a Boolean flag, timeseries
, that lets the
parser know that this is not a timeseries.
The spark_parse()
function saves the parsed calibration data in a new
.csv file, in the same location as the calibration data, with "_parsed"
appended to the filename. The first few rows look like this:
The parser has extracted the information we need from the calibration data and merged it with the plate layout information so that we now have columns containing each of the measurements for each well.
Now we can actually calculate our calibration coefficients. To do this we just need to use one function and give it our parsed data.
flopr::generate_cfs(calibration_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed.csv")
For details about how this process works, you can read our paper here. At the end, there should be two .pdf images showing the calibration curves for absorbance and fluorescence, along with a new .csv file, appended with "_cfs", containing the parameters for use in the future, the first few rows of which look like this:
The “cf” column contains the calibration coefficients that will be used to calibrate our data in later experiments.
Before we get too excited we need to check the images to make sure that the calibration curves look sensible. The software attempts to remove data points which it deems are invalid, but this process isn’t perfect and occasionally may need you to remove data points from the "*_parsed.csv" file. Using the example data, you can see (here) that some of the fluorescein wells are considered valid absorbance measurements. In this case, it isn’t the end of the world since we would never use the parameters produced from those two curves.
n.b. Currently the software is setup to work with “microspheres” for calibrating cell number and “fluorescein” for calibrating GFP fluorescence. We hope to extend it in the near future to work with other calibrants.
As mentioned above, because plate readers from different manufacturers save the data in different formats, the first step we need to do is parse our raw data. The parser that we provide takes Tecan plate reader data in the form of a .csv file. We also need a .csv file telling us what is in each well of you microtitre plate. This can include any information that you wish; we include as much meta-data as possible as it makes our data analysis later much smoother. (Before version 0.4.01: The only requirement is that the last column must be named “well” and include an identifier (usually the well id i.e. B2) that can be matched to the same identifier in the plate reader data.) Here’s an example where we include information about the strains, plasmids, media, inducers, etc.:
Now we can use our parsing function.
flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/200228_example_data.csv",
layout_csv = "examples/plate_reader/tecan_spark/200228_example_layout.csv",
timeseries = TRUE)
The data is extracted and the meta-data from the layout is attached.
Note that this time the data a timeseries so we set the timeseries
flag to TRUE
. A new .csv file is produced containing the parsed data
with "_parsed" appended to the filename.
Now we can start processing our data. There is one function that does
all the work for: process_plate()
. There are a few arguments that we
need to give the function which will control what happens.
flopr::process_plate(data_csv = "examples/plate_reader/tecan_spark/200228_example_data_parsed.csv",
blank_well = c("C12", "D12"),
neg_well = c("C6", "D6", "E6"),
od_name = "OD700",
flu_names = c("GFP", "mCherry"),
af_model = "spline",
to_MEFL = TRUE,
flu_gains = 135,
conversion_factors_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed_cfs.csv")
Let’s walk through what each of the arguments do:
data_csv
is the path to our parsed data.blank_well
are the well identifiers of wells containing media
blanks. These are used for normalising absorbance. If you only have
one blank well you can specify it using blank_well = "C12"
for
example.neg_well
are the well identifiers of wells containing negative
controls. These are used for normalising fluorescence. As above, if
you only have one negative control well you can specify it using
neg_well = "C6"
for example.od_name
is the name of the column containing our absorbance values
in the parsed data .csv file. Currently we can only use one
absorbance column, so if you record absorbance at multiple
wavelengths (like I do), you will have to pick one.flu_names
are the names of the columns containing our fluorescence
values. You can include as many or as few of you fluorescence
columns as you like. Whichever columns are named in here, we will
attempt to normalise.af_model
allows you to choose the type of model that we are going
to use for fluorescence normalisation. We’ll discuss the available
choices below.to_MEFL
is a Boolean flag that lets you tell the function if you
want to convert the fluorescence data into calibrated units. You can
only do this if you have carried out the calibration as detailed
above.flu_gains
is where you specify the gain at which your fluorescence
data was recorded for each fluorescence channel. Here we only have
calibration parameters for “GFP” so we only specify one gain value.conversion_factors_csv
is the path to the calibration parameters
that you generated using the protocol detailed above.When we run this function the absorbance is normalised, then the fluorescence and finally (if desired) the absorbance and fluorescence values are calibrated. Finally, the processed data is saved in a new .csv file with "_processed" appended to the filename and with additional columns for each of the processed values.
We also save some .pdf images comparing the raw and normalised data, and images showing the fluorescence normalisation curves. Using these plots, there are a few checks that we should make before celebrating.
Autofluorescence is the fluorescence produced by anything other than the fluorophores that we are interested in measuring. A small amount usually comes from growth media and can be minimised by choosing certain medias. A large contribution comes from molecules produced and secreted by our cells. Some of these molecules show particularly strong emission at similar wavelengths to GFP. We observe that the level of autofluorescence is not simply proportional to the number of cells or optical density of our culture. As cells enter stationary phase, autofluorescence increases, perhaps due increased production of the autofluorescent molecules and changes in cell size.
In order to remove this autofluorescence from our sample data we fit a curve to our negative control data. We provide four different models that the user can choose to fit their data (and if desired more models can be added). There are two smoothing models: “loess” and “spline”. The primary difference to the user between these two models is that the “spline” is able to extrapolate beyond the negative control data provided. This means that if the range of absorbance values at which you have measurements of your negative control is smaller than the range of your samples, we can still make an attempt at normalisation. However, this extrapolation is very crude (linear from the last data point) and can produce poor normalisation in the extrapolated range. Fortunately, negative controls tend to grow better than fluorescent samples, so extrapolation is often not an issue. We also provide a second-order polynomial and an exponential model, specified by “polynomial” and “exponential” repsectively. These are inherently able to make predictions beyond the range of normalised data and therefore may be good starting points.
Prolonged periods in stationary phase can cause autofluorescence to increase while absorbance remains stable and in some cases absorbance can start decreasing while autofluorescence does not. In these circumstances, none of the models perform particularly well. The performance of each of them should be checked and if none of them perform satisfactorily it may be necessary to trim the data to remove confounding timepoints.
We have two functions for processing flow cytometry data:
process_fcs
takes a single .fcs file, removes debris and doublets
and saves the trimmed data in a new .fcs file.process_fcs_dir
takes a folder of .fcs files and performs the same
trimming on each. It can also perform fluorescence normalisation if
you have a negative control and fluorescence calibration if you have
measured a calibrant.To process a single .fcs file we can run the following command:
flopr::process_fcs(fcs_file = "examples/flow_cytometry/DATA/20191121/pWeak_None_0_1.fcs",
flu_channels = "BL1-H",
pre_cleaned = TRUE,
do_plot = TRUE)
We need to give the function four bits of information
fcs_file
is the path to the .fcs file that we want to process.flu_channels
are the names of the fluorescence channels that we
recorded. When processing a single .fcs file like this we don’t
actually do anything with the fluorescence data. However, if you
include the channel names you can see the data in a plot that is
saved. If you have more than one fluorescence channel, the argument
needs to be a vector, which will look something like this:
flu_channels = c("BL1-H", "BL2-H", YL2-H")
pre_cleaned
lets the function know if you have gated out debris on
the flow cytometer. Most people do this when running their
experiments by setting a threshold on forward-scatter and
side-scatter. But some people like to record everything and process
the data later.do_plot
lets the function know if you want a plot to be saved of
the trimming process.In the end we have a new .fcs file with "_processed" appended to the filename, and if you asked the function to save a plot we will have a .pdf that looks something like this.
It’s much more likely that we have more than one sample that we want to process. This is where the other function comes in handy.
flopr::process_fcs_dir(dir_path = "examples/flow_cytometry/DATA/20191121",
pattern = "*Med*.fcs",
flu_channels = "BL1-H",
pre_cleaned = TRUE,
do_plot = TRUE,
neg_fcs = "pNeg_None_0_1.fcs",
calibrate = TRUE,
mef_peaks = list(list(channel = "BL1-H",
peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558))))
There are a few more arguments to this function and some of them look a bit complicated so let’s go through them.
dir_path
is the path to the folder with your .fcs files in.pattern
allows us to just process a subset of the .fcs files in
the folder. Here we are just going to process files with “Med” in
the filename. Without going into too much detail, this uses a
simplified version of “regular expressions” called “globbing
patterns”. We
can use “wildcard” symbols to represent any character (? is a place
holder for an single character and * is a place holder for 0 to any
number of characters). In this example all we know is that “Med”
appears somewhere in the filenames and they all end with “.fcs”. So
we use the * character to show that there are some unknown
characters before “Med” and between “Med” and “.fcs”. If you want to
process all .fcs files in your folder, the pattern would be
"*.fcs".flu_channels
is as above.pre_cleaned
is as above.do_plot
is as above.neg_fcs
is the filename of your negative control sample. It must
be in the folder with the other .fcs files. This is used to
normalise autofluorescence. If you don’t specify a filename here,
the processing will be carried out without any normalisation steps.calibrate
tells the function whether you want to calibrate the
fluorescence measurements. To be able to calibrate you must have an
.fcs file with data from calibration beads and it must have “beads”
somewhere in the filename. For discussion about calibration beads,
read our paper or check out TASBE and
FlowCal.mef_peaks
is where we tell the function what the true fluorophore
values are for our beads. These will be available from the bead
manufacturer. We need to specify peaks for each of the channels that
we want to calibrate and the channel name needs to correspond to one
given in flu_channels
. Here we are just calibrating the “BL1-H”
channel which corresponds to GFP on our flow cytometer. If you
wanted to calibrate two channels it would look something like this:mef_peaks = list(list(channel = "BL1-H",
peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558)),
list(channel = "YL2-H",
peaks = c(0, 218, 581, 1963, 6236, 15267, 68766, 181945)))
The function carries out the trimming as described above. Then, if a
negative control file is given, it performs fluorescence normalisation
by negating the geometric mean of the negative control’s fluorescence in
each channel from each of the other samples. Finally, if the files are
going to be calibrated, a calibration curve is fit to the bead peaks in
each of the fluorescence channels in mef_peaks
. We use a model
developed for FlowCal for our calibration curve. We then calibrate both
the raw and normalised data since normalisation can produce fluorescence
values less than or equal to 0, which get removed during calibration due
to working with logged data (the output will contain a both sets of
calibration). The new data, any plots produced, and a data_summary.csv
file (containing geometric statistics for each .fcs file) will be saved
to a new folder with the same name as the original but with
"_processed" appended.
There are a few checks that should be made to reassure yourself that everything has worked.
do_plot = TRUE
so that you can see which events have been removed during
processing. Check that the debris, if there is any, and doublets
have been correctly identified and removed.beads_dens_bw
, which has a default value of
0.025
. To identify the beads we use something called a Gaussian
kernel to smooth the fluorescence data and pick the highest points.
Sometimes this smoothing isn’t quite right; we might not smooth
enough and one peak is identified as two, or we smooth too much and
lose peaks. In the former case we need to increase beads_dens_bw
and in the latter we decrease it. If tweaking this value doesn’t
work, we also provide a way to manually specify identify the peaks
using the manual_peaks
argument. For this data, a manually
specified set of peaks would look like this:manual_peaks = list(list(channel = "BL1-H",
peaks = c(1.9, 2.5, 2.9, 3.3, 3.7, 4.2, 4.6, 4.95)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.