README.md

Using FlopR

Alex J H Fedorec 02/06/2021

Prerequisite Knowledge

We have attempted to make our software usable with minimal prior knowledge of the R programming language or programming in general. You will need to be familiar with the idea of running commands from a console or writing basic scripts. For R beginners, this is a great starting point, there are some good resources here and we suggest using the RStudio application. It provides an environment for writing and running R code.

Installation

FlopR relies on several other R packages. Most of them are available through the “Comprehensive R Archive Network (CRAN)” which just means that they can be automatically installed. There are, however, a couple that need to be manually installed if you are using the flow cytometry processing functions. To do this use the following commands:

install.packages("devtools", repos = "https://cloud.r-project.org/")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("flowCore", "flowClust", "flowStats"))

Once those packages have been installed we are ready to install FlopR.

devtools::install_github("ucl-cssb/flopr")

This grabs the latest version of FlopR from GitHub and builds it on your computer. If it installed successfully we should now be able to load the FlopR package in R.

library(flopr)

The examples in this document can be run using the data in the “example” folder. Download the whole folder and make sure that you have set your current working directory (you can use the setwd() command in R) to the location where you saved it on your computer

A Quick Note on .csv Files

This package makes extensive use of .csv (comma separated variables) files for data and meta-data. Unfortunately, not all .csv are created equal.

From Microsoft Excel, save as UTF-8 encoded .csv
file

Plate Reader Calibration

The experimental protocols for producing absorbance and fluorescence calibration data can be found here and here.

The data produced by plate readers from different manufacturers comes in different formats. The first step we need to do is to “parse” the data from the plate reader into a standard format. We have written an example parsers for use with data from Tecan plate readers and the Biotek Neo 2. If you need help writing a parser for your plate reader, please contact us and we’ll see what we can do.

For a Tecan plate reader, the data is saved as an Excel .xls file. As of flopR version 0.4.02, the spark_parse function accepts Excel files (.xls and .xlsx). If you are using an earlier version, the data first needs to be saved as a .csv file (open in Excel and “Save As” .csv) that can be read by R. We also need a .csv file telling us what is in each well of our microtitre plate. An example can be found in the “examples/plate_reader/tecan_spark” folder, but the first few rows looks like this:

| calibrant | fluorophore | media | concentration | replicate | well | |:------------|:------------|:------|--------------:|----------:|:-----| | fluorescein | GFP | PBS | 7.53000e+14 | 1 | A1 | | fluorescein | GFP | PBS | 3.76500e+14 | 1 | A2 | | fluorescein | GFP | PBS | 1.88250e+14 | 1 | A3 | | fluorescein | GFP | PBS | 9.41250e+13 | 1 | A4 | | fluorescein | GFP | PBS | 4.70625e+13 | 1 | A5 | | fluorescein | GFP | PBS | 2.35313e+13 | 1 | A6 |

Once we have the calibration data and layout .csv files we can parse the data.

flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane.csv",
                   layout_csv = "examples/plate_reader/tecan_spark/calibration_plate_layout.csv",
                   timeseries = FALSE)

The data_csv argument takes the path to the calibration data. layout_csv is the path to the plate layout .csv file. Finally, the Tecan plate readers save timeseries data differently from single timepoint data, so we have a Boolean flag, timeseries, that lets the parser know that this is not a timeseries.

The spark_parse() function saves the parsed calibration data in a new .csv file, in the same location as the calibration data, with "_parsed" appended to the filename. The first few rows look like this:

| calibrant | fluorophore | media | concentration | replicate | well | OD600 | OD700 | GFP 40 | GFP 50 | GFP 60 | GFP 70 | GFP 80 | GFP 90 | GFP 100 | GFP 110 | GFP 120 | row | column | |:------------|:------------|:------|--------------:|----------:|:-----|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|--------:|--------:|--------:|:----|-------:| | fluorescein | GFP | PBS | 10.0000 | 1 | A1 | 0.0921 | 0.0831 | 1830 | 9811 | 36911 | NA | NA | NA | NA | NA | NA | A | 1 | | fluorescein | GFP | PBS | 5.0000 | 1 | A2 | 0.0942 | 0.0852 | 932 | 4993 | 19221 | 54510 | NA | NA | NA | NA | NA | A | 2 | | fluorescein | GFP | PBS | 2.5000 | 1 | A3 | 0.1015 | 0.0928 | 453 | 2434 | 9448 | 27864 | NA | NA | NA | NA | NA | A | 3 | | fluorescein | GFP | PBS | 1.2500 | 1 | A4 | 0.0985 | 0.0899 | 232 | 1253 | 4853 | 14513 | 36349 | NA | NA | NA | NA | A | 4 | | fluorescein | GFP | PBS | 0.6250 | 1 | A5 | 0.0957 | 0.0905 | 114 | 617 | 2404 | 7128 | 18282 | 41812 | NA | NA | NA | A | 5 | | fluorescein | GFP | PBS | 0.3125 | 1 | A6 | 0.1027 | 0.0965 | 54 | 294 | 1137 | 3388 | 8672 | 20095 | 42244 | NA | NA | A | 6 |

The parser has extracted the information we need from the calibration data and merged it with the plate layout information so that we now have columns containing each of the measurements for each well.

Now we can actually calculate our calibration coefficients. To do this we just need to use one function and give it our parsed data.

flopr::generate_cfs(calibration_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed.csv")

For details about how this process works, you can read our paper here. At the end, there should be two .pdf images showing the calibration curves for absorbance and fluorescence, along with a new .csv file, appended with "_cfs", containing the parameters for use in the future, the first few rows of which look like this:

| cf | beta | calibrant | fluorophore | measure | |-----------:|-----------:|:------------|:------------|:--------| | 184.7274 | 0.0057476 | fluorescein | GFP | GFP 40 | | 996.4472 | -0.0035424 | fluorescein | GFP | GFP 50 | | 3821.0497 | -0.0004737 | fluorescein | GFP | GFP 60 | | 11298.9531 | 0.0002015 | fluorescein | GFP | GFP 70 | | 29128.3783 | -0.0007481 | fluorescein | GFP | GFP 80 | | 65834.9342 | 0.0012751 | fluorescein | GFP | GFP 90 |

The “cf” column contains the calibration coefficients that will be used to calibrate our data in later experiments.

Before we get too excited we need to check the images to make sure that the calibration curves look sensible. The software attempts to remove data points which it deems are invalid, but this process isn’t perfect and occasionally may need you to remove data points from the "*_parsed.csv" file. Using the example data, you can see (here) that some of the fluorescein wells are considered valid absorbance measurements. In this case, it isn’t the end of the world since we would never use the parameters produced from those two curves.

n.b. Currently the software is setup to work with “microspheres” for calibrating cell number and “fluorescein” for calibrating GFP fluorescence. We hope to extend it in the near future to work with other calibrants.

Processing Plate Reader Data

As mentioned above, because plate readers from different manufacturers save the data in different formats, the first step we need to do is parse our raw data. The parser that we provide takes Tecan plate reader data in the form of a .csv file. We also need a .csv file telling us what is in each well of you microtitre plate. This can include any information that you wish; we include as much meta-data as possible as it makes our data analysis later much smoother. (Before version 0.4.01: The only requirement is that the last column must be named “well” and include an identifier (usually the well id i.e. B2) that can be matched to the same identifier in the plate reader data.) Here’s an example where we include information about the strains, plasmids, media, inducers, etc.:

| strain | host | plasmid | plasmid\_2 | strain\_2 | media | sugar | amino\_acids | inducer | concentration | init\_ratio | init\_dilution | well | |:-------|:-----|:--------|:-----------|:----------|:------|:------|:-------------|:--------|--------------:|------------:|---------------:|:-----| | | | | | | | | | | NA | NA | NA | A1 | | | | | | | | | | | NA | NA | NA | A2 |

Now we can use our parsing function.

flopr::spark_parse(data_csv = "examples/plate_reader/tecan_spark/200228_example_data.csv",
                   layout_csv = "examples/plate_reader/tecan_spark/200228_example_layout.csv",
                   timeseries = TRUE)

The data is extracted and the meta-data from the layout is attached. Note that this time the data a timeseries so we set the timeseries flag to TRUE. A new .csv file is produced containing the parsed data with "_parsed" appended to the filename.

Now we can start processing our data. There is one function that does all the work for: process_plate(). There are a few arguments that we need to give the function which will control what happens.

flopr::process_plate(data_csv = "examples/plate_reader/tecan_spark/200228_example_data_parsed.csv",
                     blank_well = c("C12", "D12"),
                     neg_well = c("C6", "D6", "E6"),
                     od_name = "OD700",
                     flu_names = c("GFP", "mCherry"),
                     af_model = "spline",
                     to_MEFL = TRUE,
                     flu_gains = 135,
                     conversion_factors_csv = "examples/plate_reader/tecan_spark/191219_calibration_membrane_parsed_cfs.csv")

Let’s walk through what each of the arguments do:

When we run this function the absorbance is normalised, then the fluorescence and finally (if desired) the absorbance and fluorescence values are calibrated. Finally, the processed data is saved in a new .csv file with "_processed" appended to the filename and with additional columns for each of the processed values.

We also save some .pdf images comparing the raw and normalised data, and images showing the fluorescence normalisation curves. Using these plots, there are a few checks that we should make before celebrating.

Autofluorescence normalisation details

Autofluorescence is the fluorescence produced by anything other than the fluorophores that we are interested in measuring. A small amount usually comes from growth media and can be minimised by choosing certain medias. A large contribution comes from molecules produced and secreted by our cells. Some of these molecules show particularly strong emission at similar wavelengths to GFP. We observe that the level of autofluorescence is not simply proportional to the number of cells or optical density of our culture. As cells enter stationary phase, autofluorescence increases, perhaps due increased production of the autofluorescent molecules and changes in cell size.

In order to remove this autofluorescence from our sample data we fit a curve to our negative control data. We provide four different models that the user can choose to fit their data (and if desired more models can be added). There are two smoothing models: “loess” and “spline”. The primary difference to the user between these two models is that the “spline” is able to extrapolate beyond the negative control data provided. This means that if the range of absorbance values at which you have measurements of your negative control is smaller than the range of your samples, we can still make an attempt at normalisation. However, this extrapolation is very crude (linear from the last data point) and can produce poor normalisation in the extrapolated range. Fortunately, negative controls tend to grow better than fluorescent samples, so extrapolation is often not an issue. We also provide a second-order polynomial and an exponential model, specified by “polynomial” and “exponential” repsectively. These are inherently able to make predictions beyond the range of normalised data and therefore may be good starting points.

Prolonged periods in stationary phase can cause autofluorescence to increase while absorbance remains stable and in some cases absorbance can start decreasing while autofluorescence does not. In these circumstances, none of the models perform particularly well. The performance of each of them should be checked and if none of them perform satisfactorily it may be necessary to trim the data to remove confounding timepoints.

Flow Cytometry Processing

We have two functions for processing flow cytometry data:

Process a single .fcs file

To process a single .fcs file we can run the following command:

flopr::process_fcs(fcs_file = "examples/flow_cytometry/DATA/20191121/pWeak_None_0_1.fcs",
                   flu_channels = "BL1-H",
                   pre_cleaned = TRUE,
                   do_plot = TRUE)

We need to give the function four bits of information

In the end we have a new .fcs file with "_processed" appended to the filename, and if you asked the function to save a plot we will have a .pdf that looks something like this.

Process a folder of .fcs files

It’s much more likely that we have more than one sample that we want to process. This is where the other function comes in handy.

flopr::process_fcs_dir(dir_path = "examples/flow_cytometry/DATA/20191121",
                       pattern = "*Med*.fcs",
                       flu_channels = "BL1-H",
                       pre_cleaned = TRUE,
                       do_plot = TRUE,
                       neg_fcs = "pNeg_None_0_1.fcs",
                       calibrate = TRUE,
                       mef_peaks = list(list(channel = "BL1-H",
                                             peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558))))

There are a few more arguments to this function and some of them look a bit complicated so let’s go through them.

mef_peaks = list(list(channel = "BL1-H", 
                      peaks = c(0, 822, 2114, 5911, 17013, 41837, 145365, 287558)),
                 list(channel = "YL2-H",
                      peaks = c(0, 218, 581, 1963, 6236, 15267, 68766, 181945)))

The function carries out the trimming as described above. Then, if a negative control file is given, it performs fluorescence normalisation by negating the geometric mean of the negative control’s fluorescence in each channel from each of the other samples. Finally, if the files are going to be calibrated, a calibration curve is fit to the bead peaks in each of the fluorescence channels in mef_peaks. We use a model developed for FlowCal for our calibration curve. We then calibrate both the raw and normalised data since normalisation can produce fluorescence values less than or equal to 0, which get removed during calibration due to working with logged data (the output will contain a both sets of calibration). The new data, any plots produced, and a data_summary.csv file (containing geometric statistics for each .fcs file) will be saved to a new folder with the same name as the original but with "_processed" appended.

There are a few checks that should be made to reassure yourself that everything has worked.

manual_peaks = list(list(channel = "BL1-H", 
                      peaks = c(1.9, 2.5, 2.9, 3.3, 3.7, 4.2, 4.6, 4.95)))


ucl-cssb/flopr documentation built on Jan. 27, 2024, 11:49 a.m.