knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(echo = TRUE)

This page includes a full overview of fpcountr starting from package installation, through all steps of an example calibration and experimental data analysis. For more detail on the steps of this workflow, see the Articles.


Installation

The package is written in R and can be installed straight from GitHub:

# install.packages("devtools") # run this if you don't have devtools installed already
devtools::install_github("ec363/fpcountr") # run this to install fpcountr

You should only need to run the above code once.


Example Scenario


Example Scenario: We wish to work out how many molecules of fluorescent protein are being produced from a vector expressing mTagBFP2.

What we have: experimental data of mTagBFP2 expression, as well as an mTagBFP2 calibration carried out using the FPCountR method.

What we want to do is:

  1. create conversion factors that describe how blue fluorescence from mTagBFP2 relates to its molecule number in our instrument.
  2. use these conversion factors to interpret our experiment file, to work out the number of proteins per cell expressed by our mTagBFP2 expression vectors across different conditions.

The package comes with example data of exactly this type, and example code that will allow you to calculate protein numbers.


Example Data


The following example data files are provided with this package:

df <- read.csv("data/vignetteintro_files.csv")

df |>
  gt::gt() |>

  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # # cell borders
  # gt::tab_style(
  #   locations = gt::cells_body(
  #     # columns = tidyselect::everything()
  #   ),
  #   style = list(
  #     gt::cell_borders(
  #       sides = c("top", "bottom"),
  #       color = NULL
  #     )
  #   )
  # ) |>
  # markdown
  gt::fmt_markdown(
    columns = tidyselect::everything()
  ) |>

  # width # by default as wide as it needs to be # but shrinks when font is made smaller
  gt::tab_options(table.width = gt::px(750))

This example data will be used in the script below.

To try it yourself, create a new folder for running the script in this vignette (e.g. call it fpcountr_example). Open a new R script in RStudio (File > New File > R Script), save it to that folder, and navigate to its current working directory (Session > Set Working Directory > To Source File Location).

To find the example data, run:

system.file("extdata", "", package = "fpcountr", mustWork = TRUE)

This gives you the location of the files. Find this folder, copy all of the files listed here into a subfolder of the fpcountr_example called data.


Example Code


All the code from here to the end of this vignette can be copied into your own script and run line by line. Most of these functions will take existing data files, process them, and save new files (data as CSV files, and plots as PDF files). The data files will be required for subsequent functions, whereas the plots are intended to let you check the progress of the functions and identify if something has gone wrong.

Load the fpcountr package and verify that you are in the fpcountr_example directory.

library(fpcountr)


Protein concentration determination with the ECmax assay


1. Parse absorbance spectrum data

The first thing we need to do is to process our absorbance spectrum data file, but the exported data file is not in the right format for our processing function. So before we can process the data, we need to extract the data from the exported data file, tidy it and join it to metadata. This process is called parsing.

Parsing: All raw data files exported from plate readers require 'parsing' before they can be processed by downstream functions. See vignette("data_parsing") for more details.

Metadata: Parsing requires that metadata be joined to the data. While it is largely up to you what aspects of your experiments you record as metadata, each data-processing function in FPCountR expects a a minimum amount of metadata that is required for downstream analysis. See vignette("data_parsing") for details on what these are.

If using a Tecan Spark plate reader running Magellan software: use parse_magellan_spectrum(). If using other plate readers: see vignette("data_parsing").

We used a Spark, so we'll use parse_magellan_spectrum(): this takes the absorbance data file (example_absorbance.csv), in the export format provided by our plate reader, and a user-produced metadata file (example_absorbance_meta.csv), and parses it into the correct format.

parsed_data_spectrum <- parse_magellan_spectrum(
  data_csv = "data/example_absorbance.csv",
  metadata_csv = "data/example_absorbance_meta.csv",
  wellstart = "A1", wellend = "B12"
)

Arguments required:

Warnings expected:

Outputs produced:

View a fragment of the data frame to check.

parsed_data_spectrum[1:24,c(1:100)] # view a fragment of the dataframe
# code to display
# # old
# parsed_data_spectrum[1:10,c(1:5,8,11:13)] # view a fragment of the dataframe
# # current
# parsed_data_spectrum[1:24,c(1:100)] # view a fragment of the dataframe

# table to display
data_to_display <- parsed_data_spectrum[1:24,c(1:100)] # otherwise very slow!
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows


As you can see, the data is now in tidy, with one column per variable, allowing automated processing in subsequent steps.


2. Process absorbance spectrum data

Parsed spectrum data can then be processed. This involves calculation of the path length of the samples, adjusting the raw values to a path length of 1cm, and normalising the data to the blanks.

Path lengths: Calculating path lengths accurately is an important part of this process. See vignette("path_lengths") for more information.

# Create folder to hold the FP quantification function output files
dir.create("fp_quantification")

# Process spectra
processed_data_spectrum <- process_absorbance_spectrum(

  # basics
  spectrum_csv = "data/example_absorbance_parsed.csv",
  subset_rows = TRUE, rows_to_keep = c("A","B"), columns_to_keep = c(1:12),
  xrange = c(250,1000),

  # path length calcs
  pl_method = "calc_blanks",
  buffer_used = "TBS", concentration_used = 0.005, temperature_used = 30,

  # saving
  outfolder = "fp_quantification"
)
# AS ABOVE BUT WITH CSV_ONLY = TRUE

# Create folder for get_concentration functions' output files
dir.create("fp_quantification")

# Process spectra (correct for for path length and normalise to blanks)
processed_data_spectrum <- process_absorbance_spectrum(

  # basics
  spectrum_csv = "data/example_absorbance_parsed.csv",
  subset_rows = TRUE, rows_to_keep = c("A","B"), columns_to_keep = c(1:12),
  xrange = c(250,1000),

  # path length calcs
  pl_method = "calc_blanks", # options: "calc_each", "calc_blanks", "volume"
  buffer_used = "TBS", concentration_used = 0.005, temperature_used = 30,

  # saving
  outfolder = "fp_quantification",
  csv_only = TRUE
)

Arguments required:

Warnings expected:

Outputs produced:

..in the designated outfolder (here, fp_quantification/):

..in RStudio:

Let's have a look at a fragment of the processed data:

processed_data_spectrum[195:225,c(1:3,7,9,10,12,13)] # view a fragment of the dataframe
# nrow(processed_data_spectrum) # 9000

# code to display
# # old
# processed_data_spectrum[1:10,c(1:3,5,7,9:13)] # view a fragment of the dataframe
# # current
# processed_data_spectrum[195:225,c(1:3,7,9,10,12,13)] # view a fragment of the dataframe

data_to_display <- processed_data_spectrum[195:225,c(1:3,7,9,10,12,13)] # otherwise very slow!
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(raw_value, raw_cm1_value, normalised_cm1_value)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(raw_value, raw_cm1_value, normalised_cm1_value)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Note the new columns in red. The normalised_cm1_value column will be used for calculating the concentration of the FP in the next step.

While the absolute numbers are low, we can verify that the peak of absorbance of mTagBFP2 is 401 nm.

This is more easily evident in the plots, where we can also verify that the blanks are not too noisy and that the replicates were similar:


3. Get concentration using ECmax assay

Using this processed data, we can now work out the concentration of the FP in each dilution using the get_conc_ecmax() function. This takes the EC and excitation maximum of the FP in question from FPbase, and calculates the concentration from the absorbance at that position. It also normalises carefully for background signal in one of 3 ways.

Code:

proteinconcs <- get_conc_ecmax(
  protein_slug = "mtagbfp2",
  protein_seq = "MVHHHHHHGSGVSKGEELIKENMHMKLYMEGTVDNHHFKCTSEGEGKPYEGTQTMRIKVVEGGPLPFAFDILATSFLYGSKTFINHTQGIPDFFKQSFPEGFTWERVTTYEDGGVLTATQDTSLQDGCLIYNVKIRGVNFTSNGPVMQKKTLGWEAFTETLYPADGGLEGRNDMALKLVGGSHLIANAKTTYRSKKPAKNLKMPGVYYVDYRLERIKEANNETYVEQHEVAVARYCDLPSKLGHKLN",
  processed_spectrum_csv = "fp_quantification/example_absorbance_parsed_processed.csv",
  # wells_to_remove = c(),
  corr_method = "scatter", wav_to_use1 = 500, wav_to_use2 = 450,
  outfolder = "fp_quantification/concentration",
)
# AS ABOVE BUT WITH CSV_ONLY = TRUE

proteinconcs <- get_conc_ecmax(
  protein_slug = "mtagbfp2",
  protein_seq = "MVHHHHHHGSGVSKGEELIKENMHMKLYMEGTVDNHHFKCTSEGEGKPYEGTQTMRIKVVEGGPLPFAFDILATSFLYGSKTFINHTQGIPDFFKQSFPEGFTWERVTTYEDGGVLTATQDTSLQDGCLIYNVKIRGVNFTSNGPVMQKKTLGWEAFTETLYPADGGLEGRNDMALKLVGGSHLIANAKTTYRSKKPAKNLKMPGVYYVDYRLERIKEANNETYVEQHEVAVARYCDLPSKLGHKLN",
  processed_spectrum_csv = "fp_quantification/example_absorbance_parsed_processed.csv",
  # wells_to_remove = c(),
  corr_method = "scatter", wav_to_use1 = 500, wav_to_use2 = 450,
  outfolder = "fp_quantification/concentration",
  csv_only = TRUE
)

Arguments required:

Warnings expected:

Outputs produced:

..in the designated outfolder (here, fp_quantification/concentration/):

.. in RStudio:

proteinconcs[1:12,] # view a fragment of the dataframe
# nrow(proteinconcs) # 24
# ncol(proteinconcs) # 8

# code to display
# # old
# proteinconcs[1:10,] # view a fragment of the dataframe
# # current
# proteinconcs[1:12,] # view a fragment of the dataframe

data_to_display <- proteinconcs[1:12,]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(concentration_ngul)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(concentration_ngul)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

The example data should give 6.8 ng/ul for the top concentration.

Plot3a that illustrates the model fitting to the spectra and identification of the excitation maximum of mTagBFP2:

Plot6b that compares the linear fit of the concentrations across normalisation techniques:


Conversion factor calculation


Create a folder for this calculation:

dir.create("conversion_factors")

Assemble the metadata file, using the original metadata file, and the protein concentrations file just obtained from the ECmax assay above.

# Join files
fluorescence_meta1 <- read.csv("data/example_fluorescence_meta.csv")
fluorescence_meta2 <- read.csv("fp_quantification/concentration/protein_concs_ecmax.csv")
fluorescence_meta_joined <- dplyr::left_join(x = fluorescence_meta1, y = fluorescence_meta2)
fluorescence_meta_joined <- cbind(fluorescence_meta_joined[ , !names(fluorescence_meta_joined) %in% c("well")], fluorescence_meta_joined[ , "well"]) # move well column to right hand side - important for `generate_cfs()`
names(fluorescence_meta_joined)[ncol(fluorescence_meta_joined)] <- "well" # rename last column
write.csv(x = fluorescence_meta_joined,
          file = "conversion_factors/example_fluorescence_meta_joined.csv", 
          row.names = FALSE)
fluorescence_meta_joined[1:12,c(1,4,7,8,10,12:15)]
# nrow(fluorescence_meta_joined) # 96
# ncol(fluorescence_meta_joined) # 15

# code to display
# # old
# fluorescence_meta_joined[1:10,c(1,4,7,8,10,12:15)]
# # current
# fluorescence_meta_joined[1:12,c(1,4,7,8,10,12:15)]

data_to_display <- fluorescence_meta_joined[1:12,c(1,4,7,8,10,12:15)]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(concentration_ngul)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(concentration_ngul)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

You should be able to see that the concentration values have been joined to the other metadata. This joined metadata file will be our metadata for the fluorescence assay.


1. Parse fluorescence data

Parse the data. Here we're using a different parsing function that handles standard (endpoint) and timecourse (kinetic) data from Tecan Spark instruments running Magellan software. See Data Parsing article if you have a different setup.

parsed_data <- parse_magellan(data_csv = "data/example_fluorescence.csv",
                              metadata_csv = "conversion_factors/example_fluorescence_meta_joined.csv",
                              timeseries = FALSE
)

Arguments required:

Warnings expected:

Outputs produced:

View a fragment of the data frame to check.

parsed_data[1:12,c(1,4,8,12,14:24)] # view a fragment of the dataframe
# nrow(parsed_data) # 96
# ncol(parsed_data) # 26

# code to display
# # old
# parsed_data[1:10,c(1,4,8,12,14:24)] # view a fragment of the dataframe
# # current
# parsed_data[1:12,c(1,4,8,12,14:24)] # view a fragment of the dataframe

data_to_display <- parsed_data[1:12,c(1,4,8,12,14:24)]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = tidyselect::starts_with("blueblue")
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::starts_with("blueblue")
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Note the fact that the data is now in Tidy Format, with each fluorescence reading (at gains of 40, 50, etc..) in its own column, ready for processing.


2. Generate conversion factors

Use generate_cfs() to generate conversion factors that relates fluorescence brightness to molecule number.

fp_conversion_factors <- generate_cfs(
  calibration_csv = "data/example_fluorescence_parsed.csv",
  subset_rows = TRUE, rows_to_keep = c("A","B"),
  outfolder = "conversion_factors"
)

Arguments required:

Warnings expected:

Outputs produced:

..in the designated outfolder (here, conversion_factors/):

.. in RStudio:

For Advanced options for this function, see ?generate_cfs().

Let's check how the conversion factor table looks:

fp_conversion_factors[,c(1,4,8:13)] # view a fragment of the dataframe
# nrow(fp_conversion_factors) # 9
# ncol(fp_conversion_factors) # 13

# code to display
# # old
# fp_conversion_factors[,c(1,4,8:13)] # view a fragment of the dataframe
# # current
# fp_conversion_factors[,c(1,4,8:13)] # view a fragment of the dataframe

data_to_display <- fp_conversion_factors[,c(1,4,8:13)]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(cf, beta, residuals)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(cf, beta, residuals)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Here, cf is conversion factor and residuals allows you to check the quality of the fit. We can see the fits are good, which we can also see by looking at the plots:


Assemble conversion factors


FP conversion factors

If multiple FPs are being calibrated, assembly can be used to join various files together. Here we will just simplify the conversion factor file and move it into a conversion factor folder.

fp_conversion_factors <- read.csv("conversion_factors/example_fluorescence_parsed_cfs.csv")
fp_conversion_factors <- fp_conversion_factors |>
  dplyr::select(instrument, plate, channel_name, media, calibrant, measure, gain, cf, beta)
write.csv(fp_conversion_factors, "conversion_factors/fp_conversion_factors_assembled.csv", 
          row.names = FALSE)


OD conversion factors

These can be obtained according to protocols in the FlopR package/paper. An example calibration output is provided with the example data. Let's move it to the conversion factors folder too.

od_conversion_factors <- read.csv("data/od_conversion_factors2.csv")
write.csv(od_conversion_factors, "conversion_factors/od_conversion_factors_assembled.csv",
          row.names = FALSE)
od_conversion_factors
# nrow(od_conversion_factors) # 2
# ncol(od_conversion_factors) # 5

data_to_display <- od_conversion_factors
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows


Processing data from E. coli fluorescent protein expression experiments


With conversion factors for both mTagBFP2 fluorescence and cell number (OD) in hand, we are ready to process the experimental data.

1. Parse data

First, we parse, though this time we use timeseries = TRUE:

parsed_data <- parse_magellan(
  data_csv = "data/example_experiment.csv",
  metadata_csv = "data/example_experiment_meta.csv",
  timeseries = TRUE,
  metadata_above = 1, # Well Positions
  custom = TRUE, startcol = 3, endcol = 97, insert_wells_above = 0, insert_wells_below = 1
)

Arguments required:

Warnings expected:

Outputs produced:

parsed_data[1:24,c(3,6:13)] # view a fragment of the dataframe
# nrow(parsed_data) # 9000
# ncol(parsed_data) # 16

# code to display
# # old
# parsed_data[1:10,c(3,6,8:15)] # view a fragment of the dataframe
# # current
# parsed_data[1:24,c(3,6:13)] # view a fragment of the dataframe

data_to_display <- parsed_data[1:24,c(3,6:13)]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(OD600, OD700, blue, bluelow)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(OD600, OD700, blue, bluelow)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Note the data consists of two OD measurements and two fluorescence measurements at low and high gain.


2. Process data

Process the experimental data using process_plate().

processed_data <- process_plate(
  data_csv = "data/example_experiment_parsed.csv",
  blank_well = c("A11", "B11", "C11", "D11", "E11", "F11", "G11", "H11"),
  od_name = "OD700",

  # fluorescence labels
  flu_channels = c("blue"),
  flu_channels_rename = c("blueblue"),

  # correction
  do_quench_correction = TRUE,
  od_type = "OD700",

  # calibrations
  do_calibrate = TRUE,
  instr = "spark1",
  flu_slugs = c("mTagBFP2"),
  flu_gains = c(60),
  flu_labels = c("mTagBFP2"),

  # conversion factors
  od_coeffs_csv = "conversion_factors/od_conversion_factors_assembled.csv",
  fluor_coeffs_csv = "conversion_factors/fp_conversion_factors_assembled.csv",

  # background autofluorescence subtraction
  af_model = "spline",
  neg_well = c("A2", "A3", "A4", "A5", "A6", "A7"),

  outfolder = "experiment_analysis"
)
# AS ABOVE BUT WITH CSV_ONLY = TRUE

processed_data <- process_plate(
  data_csv = "data/example_experiment_parsed.csv",
  blank_well = c("A11", "B11", "C11", "D11", "E11", "F11", "G11", "H11"),
  od_name = "OD700",

  # fluorescence labels
  flu_channels = c("blue"),
  flu_channels_rename = c("blueblue"),

  # correction
  do_quench_correction = TRUE,
  od_type = "OD700",

  # calibrations
  do_calibrate = TRUE,
  instr = "spark1",
  flu_slugs = c("mTagBFP2"),
  flu_gains = c(60),
  flu_labels = c("mTagBFP2"),

  # conversion factors
  od_coeffs_csv = "conversion_factors/od_conversion_factors_assembled.csv",
  fluor_coeffs_csv = "conversion_factors/fp_conversion_factors_assembled.csv",

  # background autofluorescence subtraction
  af_model = "spline",
  neg_well = c("A2", "A3", "A4", "A5", "A6", "A7"),

  outfolder = "experiment_analysis",
  csv_only = TRUE
)

This is a fairly involved function.

Arguments required:

Fluorescence channel names:

Quench correction:

Calibration parameters:

Conversion factor data:

Background autofluorescence subtraction:

Saving:

Warnings expected:

Outputs produced:

..in the designated outfolder (here, experiment_analysis/):

.. in RStudio:

For Advanced options for this function, see ?process_plate().

If we view a fragment of the dataframe to check it:

processed_data[14:24,c(3,6,8,9,17:23)] # view a fragment of the dataframe
# nrow(processed_data) # 9000
# ncol(processed_data) # 23

# code to display
# # old
# processed_data[14:24,c(3,6,8,9,17:23)] # view a fragment of the dataframe
# # current
# processed_data[14:24,c(3,6,8,9,17:23)] # view a fragment of the dataframe

data_to_display <- processed_data[14:24,c(3,6,8,9,17:23)]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(pathlength, normalised_OD_cm1, normalised_blueblue, flu_quench, corrected_normalised_blueblue, calibrated_OD, calibrated_mTagBFP2)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(pathlength, normalised_OD_cm1, normalised_blueblue, flu_quench, corrected_normalised_blueblue, calibrated_OD, calibrated_mTagBFP2)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Note the many new columns created by process_plate().


Normalisation (black=raw, red=normalised):

Quench correction (black=normalised, red=corrected):

Calibration (red=calibrated):


3. Calculate per cell values

calc_fppercell() can be used to estimate molecules per cell.

pc_data_mTagBFP2 <- calc_fppercell(
  data_csv = "experiment_analysis/example_experiment_parsed_processed.csv",
  flu_channels = c("blueblue"),
  flu_labels = c("mTagBFP2"),
  remove_wells = c("A11", "B11", "C11", "D11", "E11", "F11", "G11", "H11", # media
                   "A1", "B1", "C1", "D1", "E1", "F1", "G1", "H1", 
                   "A12", "B12", "C12", "D12", "E12", "F12", "G12", "H12"), # empty wells
  get_rfu_od = FALSE,
  get_mol_cell = TRUE,
  outfolder = "experiment_analysis"
)

Arguments required:

Outputs produced:

..in the designated outfolder (here, experiment_analysis/percell_data/):

.. in RStudio:

View a fragment of the dataframe to check it:

data_to_display <- pc_data_mTagBFP2 |>
  dplyr::filter(time == max(pc_data_mTagBFP2$time)) |>
  dplyr::select(plasmid, ara_pc, time, OD600, calibrated_OD, blueblue, calibrated_mTagBFP2, calibratedmTagBFP2_perCell)
data_to_display[c(13:15,19:21,25:27,31:33),]
# nrow(pc_data_mTagBFP2) # 6912
# ncol(pc_data_mTagBFP2) # 24

# code to display
# # old
# pc_data_mTagBFP2[10:18,c(3,6,8,9,12,23,24)] # view a fragment of the dataframe
# # current
# pc_data_mTagBFP2[1:24,c(3,6,8,9,12,23,24)] # view a fragment of the dataframe

# data_to_display <- pc_data_mTagBFP2[1:24,c(3,6,8,9,12,23,24)]

data_to_display <- pc_data_mTagBFP2 |>
  dplyr::filter(time == max(pc_data_mTagBFP2$time)) |>
  dplyr::select(plasmid, ara_pc, time, OD600, calibrated_OD, blueblue, calibrated_mTagBFP2, calibratedmTagBFP2_perCell)
data_to_display <- data_to_display[c(13:15,19:21,25:27,31:33),]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(calibratedmTagBFP2_perCell)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(calibratedmTagBFP2_perCell)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

At the final timepoint, we can see that the abundance of mTagBFP2 in these samples was in the range of 50,000 to over 300,000 molecules per cell.



4. Calculate cellular concentration

calc_fpconc() can be used to estimate molecular concentration. This is somewhat similar in structure to the above.

molar_data_mTagBFP2 <- calc_fpconc(
  data_csv = "experiment_analysis/example_experiment_parsed_processed.csv",
  flu_channels = c("blueblue"),
  flu_labels = c("mTagBFP2"),
  remove_wells = c("A11", "B11", "C11", "D11", "E11", "F11", "G11", "H11", # media
                   "A1", "B1", "C1", "D1", "E1", "F1", "G1", "H1", 
                   "A12", "B12", "C12", "D12", "E12", "F12", "G12", "H12"), # empty wells
  get_rfu_vol = FALSE,
  get_mol_vol = TRUE,

  od_specific_total_volume = 3.6,
  odmeasure = "OD700",
  odmeasure_conversion = 0.79,

  outfolder = "experiment_analysis"
)

Arguments required:

Outputs produced:

..in the designated outfolder (here, experiment_analysis/molar_data/):

.. in RStudio:

View a fragment of the dataframe to check it:

data_to_display <- molar_data_mTagBFP2 |>
  dplyr::filter(time == max(molar_data_mTagBFP2$time)) |>
  dplyr::select(plasmid, ara_pc, time, OD700, calibrated_OD, blueblue, calibrated_mTagBFP2, calibrated_mTagBFP2_Molar)
data_to_display[c(13:15,19:21,25:27,31:33),]
# nrow(molar_data_mTagBFP2) # 6912
# ncol(molar_data_mTagBFP2) # 30

# code to display
# # old
# molar_data_mTagBFP2[10:18,c(3,6,8,9,12,23,30)] # view a fragment of the dataframe
# # current
# molar_data_mTagBFP2[1:24,c(3,6,8,9,12,23,30)] # view a fragment of the dataframe

# data_to_display <- molar_data_mTagBFP2[1:24,c(3,6,8,9,12,23,30)]

data_to_display <- molar_data_mTagBFP2 |>
  dplyr::filter(time == max(molar_data_mTagBFP2$time)) |>
  dplyr::select(plasmid, ara_pc, time, OD700, calibrated_OD, blueblue, calibrated_mTagBFP2, calibrated_mTagBFP2_Molar)
data_to_display <- data_to_display[c(13:15,19:21,25:27,31:33),]
data_to_display |>
  gt::gt() |>
  # TABLE HEADER
  # header style
  gt::tab_style(
    locations = gt::cells_column_labels(),
    style = list(
      gt::cell_text(size = "small"),
      gt::cell_text(weight = "bold")
    )
  ) |>
  # TABLE BODY
  # font smaller
  gt::tab_style(
    locations = gt::cells_body(
      columns = tidyselect::everything()
    ),
    style = list(
      gt::cell_text(size = "small")
    )
  ) |>
  # NEW COLS
  gt::tab_style(
    locations = gt::cells_column_labels(
      columns = c(calibrated_mTagBFP2_Molar)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) |>
  gt::tab_style(
    locations = gt::cells_body(
      columns = c(calibrated_mTagBFP2_Molar)
    ),
    style = list(
      gt::cell_text(color = "red")
    )
  ) #|>
  # TABLE SIZE
  # gt::tab_options(container.width = gt::px(800)) |> # don't specify, so it can change w window size
  # gt::tab_options(container.height = gt::px(350)) # this is approx 10 rows: the header + 9 rows

Concentration values are estimated at 10-70 uM.



These files can then be used to analyse data in absolute quantities... Note that arbitrary values suggest nothing about mTagBFP2 abundance or its likely effect on the cell, since the values are biologically meaningless. Absolute values, however, allow us to compare mTagBFP2 levels to the cellular context. For example, at high arabinose concentrations, our vector effectively overproduces proteins to levels higher than the most abundant native proteins in an E. coli cell, or up to 20% of its proteome.

data.to.plot <- pc_data_mTagBFP2 |>
  dplyr::filter(plasmid == "pS361_ara_mTagBFP2" | plasmid == "pS381_ara_mTagBFP2") |>
  dplyr::mutate(ori = ifelse(plasmid == "pS361_ara_mTagBFP2", "p15a", "colE1"))
data.to.plot$ori <- factor(data.to.plot$ori, levels = c("p15a", "colE1"))
data.to.plot <- data.to.plot |>
  dplyr::mutate(time_h = time/60)

# find mean and sd
data.to.plot <- data.to.plot |>
  dplyr::filter(ori == "colE1") |>
  dplyr::group_by(ara_pc, time) |>
  dplyr::mutate(mean = mean(calibratedmTagBFP2_perCell, na.rm = TRUE)) |>
  dplyr::mutate(sd = sd(calibratedmTagBFP2_perCell, na.rm = TRUE)) |>
  dplyr::ungroup()

# raw values - find mean and sd
data.to.plot <- data.to.plot |>
  dplyr::group_by(ara_pc, time) |>
  dplyr::mutate(arbitrary_perCell = normalised_blueblue/normalised_OD_cm1) |>
  dplyr::mutate(mean_arbitrary_perCell = mean(arbitrary_perCell, na.rm = TRUE)) |>
  dplyr::mutate(sd_arbitrary_perCell = sd(arbitrary_perCell, na.rm = TRUE)) |>
  dplyr::ungroup()

# # original plot - plot 2 axes for arbitrary and absolute values
# scale = 500000/24000 # 20.83
# # % of proteins that is 500,000? # 500000/2.36e6 # 21% # https://book.bionumbers.org/how-many-proteins-are-in-a-cell/
# # if 500,000 = 21%, what is ef-tu at 6% total? # 0.06*2.36e6 # 141,600 proteins/cell
# plot1 <- ggplot2::ggplot(data.to.plot |>
#                            dplyr::filter(time_h == 10)) +
#   
#   # # test only:
#   # # arbitrary values:
#   # # ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = blueblue/OD700), size = 0.5, colour = "#7ABF26") + # 0 - 30,000
#   # ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = (normalised_blueblue/normalised_OD_cm1), colour = "raw"), size = 0.5) + # 0-24,000
#   # # absolutes:
#   # ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = calibratedmTagBFP2_perCell, colour = "absolute"), size = 0.5) + # 0-500,000
#   
#   # absolutes with scaling:
#   ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = calibratedmTagBFP2_perCell/scale), size = 0.5) + # 0-500,000
#   ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = mean/scale), shape = 1, size = 2) +
#   # ggplot2::geom_errorbar(ggplot2::aes(x = ara_pc, ymin = mean-sd, ymax=mean+sd), size = 0.5) +
#   
#   ggplot2::scale_x_discrete("arabinose (%)") +
#   
#   # # test only:
#   # ggplot2::scale_y_continuous() +
#   
#   ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~.*scale, name = "molecules/cell")) +
#   ggplot2::ylab("arbitrary units") +
#   
#   ggplot2::theme_bw() +
#   ggplot2::theme(
#     aspect.ratio = 0.5,
#     panel.grid = ggplot2::element_blank(),
#     axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5),
#     
#     axis.ticks.y.right = ggplot2::element_line(color = "#D63C79"),
#     axis.text.y.right = ggplot2::element_text(color = "#D63C79"),
#     axis.title.y.right = ggplot2::element_text(color = "#D63C79")
#   ) +
#   ggplot2::annotate("rect", xmin = 0, xmax = Inf, ymin = 100/scale, ymax = 2000/scale, fill = "#D63C79", alpha = 0.2) +
#   ggplot2::annotate("text", label = "median protein abundance", x = 8.5, y = 2000/scale, colour = "#D63C79") +
#   ggplot2::geom_hline(yintercept = 141600/scale, colour = "#D63C79", linetype = "dashed") +
#   ggplot2::annotate("text", label = "abundance of EF-Tu", x = 2, y = 160000/scale, colour = "#D63C79") +
#   ggplot2::geom_hline(yintercept = 500000/scale, colour = "#D63C79", linetype = "dashed") +
#   ggplot2::annotate("text", label = "500,000 molecules per cell (20% proteome)", x = 3, y = 480000/scale, colour = "#D63C79") +
#   ggplot2::ggtitle('Abundance of mTagBFP2 and relationship to the E. coli proteome') # produced in DH10B from a SEVA vector set
# plot1

# clearer version = 2 plots
# plot1 - arbitrary
plot1 <- ggplot2::ggplot(data.to.plot |>
                           dplyr::filter(time_h == 10)) +
  # arbitrary values:
  ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = arbitrary_perCell), size = 0.5) + # 0-24,000
  ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = mean_arbitrary_perCell), shape = 1, size = 2) +
  ggplot2::geom_errorbar(ggplot2::aes(x = ara_pc, ymin = mean_arbitrary_perCell-sd_arbitrary_perCell, 
                                      ymax=mean_arbitrary_perCell+sd_arbitrary_perCell), size = 0.1, width = 0.1) +
  ggplot2::scale_x_discrete("arabinose (%)") +
  ggplot2::scale_y_continuous() +
  ggplot2::ylab("mTagBFP2 abundance\n(arbitrary units)") +
  ggplot2::theme_bw() +
  ggplot2::theme(
    aspect.ratio = 0.5,
    panel.grid = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) +
  ggplot2::annotate("text", label = "[without absolute units\nwe can't put the results into context]", x = 8.5, y = 10000, colour = "#D63C79") +
  ggplot2::ggtitle('Abundance of mTagBFP2 in arbitrary units')
plot1

# plot2 - absolute
# % of proteins that is 500,000? # 500000/2.36e6 # 21% # https://book.bionumbers.org/how-many-proteins-are-in-a-cell/
# if 500,000 = 21%, what is ef-tu at 6% total? # 0.06*2.36e6 # 141,600 proteins/cell
plot2 <- ggplot2::ggplot(data.to.plot |>
                           dplyr::filter(time_h == 10)) +

  # absolutes with scaling:
  ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = calibratedmTagBFP2_perCell), size = 0.5) + # 0-500,000
  ggplot2::geom_point(ggplot2::aes(x = ara_pc, y = mean), shape = 1, size = 2) +
  ggplot2::geom_errorbar(ggplot2::aes(x = ara_pc, ymin = mean-sd, ymax=mean+sd), size = 0.1, width = 0.1) +

  ggplot2::scale_x_discrete("arabinose (%)") +
  ggplot2::scale_y_continuous() +
  ggplot2::ylab("mTagBFP2 abundance\n(molecules/cell)") +

  ggplot2::theme_bw() +
  ggplot2::theme(
    aspect.ratio = 0.5,
    panel.grid = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5),

    axis.ticks.y = ggplot2::element_line(color = "#D63C79"),
    axis.text.y = ggplot2::element_text(color = "#D63C79"),
    axis.title.y = ggplot2::element_text(color = "#D63C79")
  ) +
  ggplot2::annotate("rect", xmin = 0, xmax = Inf, ymin = 100, ymax = 2000, fill = "#D63C79", alpha = 0.2) +
  ggplot2::annotate("text", label = "median protein abundance", x = 8.5, y = 2000, colour = "#D63C79") +
  ggplot2::geom_hline(yintercept = 141600, colour = "#D63C79", linetype = "dashed") +
  ggplot2::annotate("text", label = "most abundant proteins", x = 2, y = 160000, colour = "#D63C79") +
  ggplot2::geom_hline(yintercept = 500000, colour = "#D63C79", linetype = "dashed") +
  ggplot2::annotate("text", label = "500,000 molecules per cell (20% proteome)", x = 3, y = 480000, colour = "#D63C79") +
  ggplot2::ggtitle('Abundance of mTagBFP2 in absolute units') # produced in DH10B from a SEVA vector set
plot2


Further information


Our paper covers the purpose and structure of these functions in more detail.



ec363/fpcountr documentation built on Nov. 29, 2024, 12:03 p.m.