In albgarre/biogrowth: Modelling of Population Growth

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(biogrowth)
library(tidyverse)

The serial-fold dilution method

Plate counts are generally a reliable method for studying microbial growth. However, this approach does not scale well with the size of the experimental design, such as when studying variability or developing a secondary model, where we must analyze hundreds of curves. This issue can be mitigated using experiments based on optical density, as devices (such as a bioscreen or a plate reader) can use 96-well or 100-well plates, so a large number of conditions can be run in parallel.

It is important to underline that these devices do not measure the microbial concentration directly. Instead, it is estimated from the optical density. The device sends a ray of light through the well and measures the intensity that reaches a sensor. Higher microbial concentration will scatter more light and reduce the light intensity read. The dataset example_od includes a typical dataset provided by this type of device:

data("example_od")
example_od %>%
  pivot_longer(-"time") %>%
  ggplot() +
  geom_line(aes(x = time, y = value, colour = name)) +
  theme(legend.position = "none")

Although the OD curve looks like a bacterial growth curve, it is not. Higher microbial concentrations imply a higher OD, but this relation is not one-to-one. Furthermore, OD-based devices have a relatively large detection limit (often around 6 log CFU/mL, although this limit is strain and media dependent), so only the background noise is measured for low microbial concentrations. As a result, the OD curve has a longer lag phase than the actual microbial growth curve. Another issue is that the device often "saturates" before the stationary growth phase. This is because, at some point, the OD is so high (due to the high microbial concentration) that there is no longer a measurable change in OD. Hence, the OD curve enters the stationary phase before the actual microbial population (red in the plot above).

Therefore, when using OD-based devices, we only observe a relativelly small part of the growth curve. Also, even though we can estimate some growth parameters ($\lambda$, $\log N_{max}$ and $\mu$) directly from the OD curve, they are not really representative of the actual microbial growth.

The serial-fold dilution method

Principles

The serial-fold dilution method allows the estimation of the parameters of the growth curve directly from OD measurements. This method is based on these hypotheses:

the microbial concentration that corresponds to a given OD (within a reasonable range) is unique. For instance, that an OD of 0.2 corresponds to a microbial concentration of 6 log CFU/ml.
that the growth parameters ($\lambda$, $\mu$) do not depend on the initial bacterial concentration.

Under these hypotheses, we would observe this for different initial levels of the inoculum:

c(0:4) %>%
    map(.,
        ~ predict_growth(seq(0, 100, length = 1000),
                         list(model = "Baranyi", 
                              logN0 = log10(100/6^.),
                              logNmax = 8,
                              mu = .2, lambda = 15))
        ) %>%
    imap_dfr(., ~ mutate(.x$simulation, dil = .y-1)) %>%
    ggplot() +
    geom_line(aes(x = time, y = logN, colour = factor(dil))) +
    geom_hline(yintercept = 6, linetype = 2) +
    theme(legend.position = "none")

The plot shows a horizontal line, which corresponds to a hypothetical detection concentration $N_{det}$ according to some OD (e.g., 0.25). It can be demonstrated that, under those hypotheses, the time to detection $t_{det}$ is linear with the initial concentration on each well ($N_0$:

$$ t_{det} = \left( \lambda + \frac{1}{\mu} \log N_{det} \right) - \frac{1}{\mu} \log N_0 $$

Taking advantage of that, the serial-dilution method performs several experiments in parallel for different initial concentrations. Then, the parameters of the growth curve can be obtained from the values of $t_{det}$ by (non-linear) regression.

For practical reasons, the different initial concentrations are defined by serial dilution of the samples. A bacterial inoculum is introduced in the wells at one edge of the plate. Then, this is serially diluted (often two-fold) over the whole row. Therefore, the initial concentration on each well is defined by

$$ \log N_0 = \log \left( N_{dil0} / f^d \right) = \log N_{dil0} - \log f^d = \log N_{dil0} - d \cdot \log f $$

where $N_{dil0}$ is the microbial concentration in the well corresponding to the zero-dilution, $d$ is the number of dilutions and $f$ is the dilution factor.

By introducing that expression in the equation above, the $t_{det}$ can be calculated from the number of dilutions according to

$$ t_{det} = \left( \lambda + \frac{1}{\mu} \log N_{det} - \frac{1}{\mu} \log N_{dil0} \right) + \frac{1}{\mu} \cdot d \cdot \log f $$

Implementation of the serial dilution method

Getting the time-to-detection

The biogrowth package includes several functions and methods that implement the serial dilution method. First, get_TTDs() can be used to calculate $t_{det}$ directly from OD data. This function takes three arguments:

OD_data
target_OD
codified

The data is introduced as a tibble (or data.frame) using OD_data. It must include a column named time with the time of the reading. Then, the remaining columns indicate the OD measured on each well. The example_od dataset includes an example of the data format.

data("example_od")
head(example_od)

Any detection OD can be defined using the target_OD argument. The $t_{det}$ for each well is calculated by linear interpolation.

By passing these two arguments to get_TTDs(), the function returns a tibble with two columns: condition (the name of the well, according to the names in OD_data) and TTD (the value of $t_{det}$ for that well). For wells where $t_{det}$ is not reached, the function returns an NA.

my_TTDs <- get_TTDs(example_od, target_OD = 0.2)
head(my_TTDs)

Using the serial-dilution method requires the number of dilutions to be identified. This can be done by setting codified = TRUE. In that case, the column names of OD_data must be defined as "condition" + "_" + "n_dilutions". For instance, the following column names in example_od indicate dilution 0, 1 and 2 for the same condition:

names(example_od)[c(2, 5, 8)]

Now, the tibble returned by get_TTDs() has an additional column with the number of dilutions. Furthermore, the identified of the dilution number is removed from condition:

my_TTDs <- get_TTDs(example_od, target_OD = 0.2, codified = TRUE)
head(my_TTDs)

Estimating the growth rate from the time-to-detection

The fit_serial_dilution() function implements the serial-dilution method. It takes 7 arguments:

TTD_data
start
dil_factor
mode
logN_det
logN_dil0
max_dil

The argument TTD_data defines the data for the fitting. It must be a tibble (or data.frame) with two columns: TTD (the $t_{det}$) and dil the dilution number. The output of get_TTDs (after filtering for one condition) already has that format.

my_data <- filter(ttds, condition == "S/6,5/35/R1")

The dilution factor is defined by the argument dil_factor. By default, this parameter takes the value 2 (as the two-fold dilution method is the most common). Then, the fitting is done by nonlinear regression, so the function requires initial guesses for the model parameters using the argument start.

The fitting can be done in two different modes. The default ("intercept") simplifies the equation into an intercept term (a). Therefore, start must include an initial guess for mu and a.

my_fit <- fit_serial_dilution(my_data, start = c(a = 0, mu = .1))

The function returns an instance of FitSerial with the results of the fit. It includes common S3 methods for analysing the model output such as print

my_fit

coef

coef(my_fit)

or summary

summary(my_fit)

It also includes a plot method to compare the fitted curve against the values of $t_{det}$.

plot(my_fit)

A common issue when applying the serial-dilution method is that high dilution numbers often have higher error. Therefore, the function includes the max_dil argument to define a maximum number of dilutions to consider. For instance, we can only take the first 5 serial dilutions:

fit_max5 <- fit_serial_dilution(my_data, start = c(a = 0, mu = .1), max_dil = 5)

Now, the plot method shows a lower number of data points:

plot(fit_max5)

Obviously, this results in different parameter estimates:

coef(fit_max5)

Estimating both the growth rate and lag phase duration from the time-to-detection

The serial-dilution method also allows for an estimation of the lag phase duration. This mode can be activated by passing mode = "lambda" to fit_serial_dilution(). This mode requires additional information; namely, the initial microbial concentration at the well with dilution 0 (logN_dil0) and the microbial concentration at the detection of (logN_det). This information must be identified using independent experiments (i.e., by direct plating).

Please note that this mode requires an initial guess on lambda instead on a:

my_fit2 <- fit_serial_dilution(my_data, 
                               start = c(lambda = 0, mu = .1),
                               mode = "lambda", 
                               logN_det = 7.5,
                               logN_dil0 = 4)

The function also returns for this mode an instance of FitSerial with the same methods. The plot method shows that the fitted curve is the same, regarless of the mode: