In aef1004/bactcountr: Back Calculate CFU's Based On Dilution Series

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

bactcountr

The goal of bactcountr is to provide an automated way to calculate CFUs based on the dilutions used and to plot the final results.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("aef1004/bactcountr")

Example

This is a basic example which shows you how to calculate CFUs based on your data:

library(bactcountr)

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggbeeswarm)
library(readxl)
library(purrr)
library(rstatix)
library(ggpubr)

In order to use the first function tidy_CFU the data must have some columns:

some type of naming or grouping convention to separate out the individual observations such as the columns group, replicate, organ
dilution columns that are labeled dilution_x where x is the dilution used

data("CFU_data")
head(CFU_data)

When the tidy_CFU function is used, it puts it into a tidy format. The naming/grouping columns are left alone, but the dilution and CFU columns are gathered so that each row of the dataframe represents a single observation. Any values in the original dataframe that are labeled as "TNTC" (Too Numerous To Count) are converted to NA columns as they cannot be used to calculate the CFUs. The CFUs are then filtered to those less than 150 as counts above this number are generally not accurate.

CFU_raw_formatted <- tidy_CFU(CFU_data)
head(CFU_raw_formatted)

CFU_raw_formatted %>%
  calculate_cfu(dilution_factor = 5, 
                            resuspend_volume_ml = 0.5, 
                            volume_plated_uL = 100, 
                            percent = 0.5, "dilution", "CFUs")

The function pick_one_dilution sifts through all of the data for the groups and finds the CFU observation for each group/replicate/organ that is closest to 25 CFUs (and therefore, most likely the most accurate observation). It picks one of the dilution-CFU observations per grouping.

CFU_one_dilution <- pick_one_dilution(CFU_raw_formatted, "CFUs", c("group", "organ", "replicate"))

head(CFU_one_dilution)

The calculate_cfu function calculates the whole CFUs and log CFUs for each observation in the data. It takes in experimental parameters such as the dilution factor used, the volume (in milliliters) used to resuspend the CFU solution, and the percent of the organ used (if organ is used, default is 1 so whole organ).

final_data <- calculate_cfu(CFU_one_dilution, 
                            dilution_factor = 5, 
                            resuspend_volume_ml = 0.5, 
                            volume_plated_uL = 100, 
                            percent = 0.5, "dilution", "CFUs")

head(final_data)

Example of full run-through of the data and plotting

This is an example reading in the data from an excel file. The functions are all run through so that the log CFUs are calculated for each group and replicate for the spleen. The results can subsequently be plotted.

# example file for the excel file
example_file_address <- system.file("extdata", "PL_D-21_BCG_CFUs.xlsx", package = "bactcountr")

# all of the functions are used to tidy the data, pick one dilution, and then calculate the log CFUs
analyzed_CFUs <- read_xlsx(example_file_address) %>%
  tidy_CFU() %>%
  pick_one_dilution("CFUs", c("group", "organ", "replicate")) %>%
  filter(organ == "Spleen") %>%
  calculate_cfu(dilution_factor = 5,
                resuspend_volume_ml = 0.5,
                volume_plated_uL = 100,
                percent = .5, "dilution", "CFUs")

# the CFUs can then be plotted
ggplot(analyzed_CFUs, aes(group, log_CFUs, color = group)) +
  geom_beeswarm(groupOnX = TRUE) +
  ggtitle("PL BCG CFUs D-21 Spleen")

The average of the groups can then be plotted.

# calculate the average CFUs per group
avg_data <- analyzed_CFUs %>%
  ungroup() %>%
  select(group, log_CFUs) %>%
  group_by(group) %>%
  mutate(avg_CFUs = mean(log_CFUs)) %>%
  select(group, avg_CFUs) %>%
  unique() %>%
  rename(log_CFUs = avg_CFUs) %>%
  ungroup() %>%
  mutate(group = as.factor(group))

# plot average and individual CFUs
ggplot(analyzed_CFUs, aes(x = as.factor(group), y = log_CFUs)) +
  geom_bar(data = avg_data,  stat = "identity", aes(fill =group)) +
    geom_point(color = "black") +
  xlab("Group") +
  ggtitle("PL BCG D-21 Spleen Average CFUS")

Alternative use without using pick_one_dilution

If you want to see the consistancy of the plating across the dilutions, you can skip the pick_one_dilution function and plot the calculated log_CFUs for group/replicate against the dilutions.

analyzed_CFUs_wo_pick_one <- read_xlsx(example_file_address) %>%
  tidy_CFU() %>%
  filter(organ == "Spleen") %>%
  calculate_cfu(5, 0.5, 100, .5, "dilution", "CFUs") %>%
  unite(col = group_replicate, group, replicate, sep = "_")

ggplot(analyzed_CFUs_wo_pick_one, aes(dilution, log_CFUs, color = group_replicate)) +
  geom_point() +
  geom_path() +
  ggtitle("PL BCG CFUs D-21 Spleen for all dilutions")

Calculate statistical significance with different dataset

We will use a different dataset here that has real statistical significance. The data is from the PL Experiment, D21 post-infection with Mycobacterium tuberculosis.

# example file for the excel file
example_file_address <- system.file("extdata", "PL_D21_Mtb_CFUs.xlsx", package = "bactcountr")

# all of the functions are used to tidy the data, pick one dilution, and then calculate the log CFUs
analyzed_CFUs <- read_xlsx(example_file_address, sheet = "Raw Counts") %>%
  tidy_CFU() %>%
  pick_one_dilution("CFUs", c("group", "organ", "replicate")) %>%
  filter(organ == "Lung") %>%
  calculate_cfu(dilution_factor = 5, 
                resuspend_volume = 0.5,
                volume_plated_uL = 100,
                percent = 1/3, "dilution", "CFUs") %>%
  mutate(group = as.factor(group))

# the CFUs can then be plotted
ggplot(analyzed_CFUs, aes(group, log_CFUs, color = group)) +
  geom_beeswarm(groupOnX = TRUE) +
  ggtitle("PL Mtb CFUs D21 Lung")

Calculate statistical significance with ANOVA and Tukey HSD

# calculate ANOVA and tukey HSD stat signif
sig2 <- analyzed_CFUs %>%
  ungroup() %>%
  select(group, log_CFUs) %>%
  aov(log_CFUs ~ group, data = .) %>%
  TukeyHSD() %>%
  broom::tidy() %>%
  separate(col = "contrast", c("group1", "group2"), sep = "-") %>%
  filter(adj.p.value < 0.05) %>%
  mutate(adj.p.value   = round(adj.p.value, 3)) 

# calculates where the p-value bar will first show on the plots
ypos <- c(max(max(abs(sig2$conf.low)), max(abs(sig2$conf.high))))

# plot the results
ggscatter(x = "group", y = "log_CFUs", color = "group", data = analyzed_CFUs) +
  stat_pvalue_manual(data = sig2, label = "adj.p.value", 
                     hide.ns = TRUE, y.position = ypos, step.increase = 0.1) +
  ggtitle("PL Mtb CFUs D21 Lung")

Test different statistical analyses

# one-tailed t.test - have to choose a mu value
# have to choose "greater or "less" and a "mu"
t_sig <-analyzed_CFUs %>%
  ungroup() %>%
  t_test(log_CFUs ~ group, mu = 2) %>% #   adjust_pvalue(method = "BH") %>%
  add_significance() %>%
  rstatix::add_xy_position(x = "group")


ggline(x = "group", y = "log_CFUs", color = "group", alpha = 0.5, 
       data = analyzed_CFUs,
       font.label = list(size = 30, face = "plain")) +
      stat_pvalue_manual(data = t_sig, hide.ns = TRUE) +
    ggtitle(paste("T-test"))

# two way anova - can choose additional factors that play a role

Different plots

# line with mean_se
ggline(x = "group", y = "log_CFUs", color = "group", alpha = 0.5, data = analyzed_CFUs,
       font.label = list(size = 30, face = "plain"), add = c("mean_se")) +
  stat_pvalue_manual(data = sig2, label = "adj.p.value", 
                     hide.ns = TRUE, y.position = ypos, step.increase = 0.1) +
  ggtitle("PL Mtb CFUs D21 Lung")

# bar plot
ggbarplot(x = "group", y = "log_CFUs", fill = "group", data = analyzed_CFUs,
       font.label = list(size = 30, face = "plain"), add = c("mean_se")) +
  stat_pvalue_manual(data = sig2, label = "adj.p.value", 
                     hide.ns = TRUE, y.position = ypos, step.increase = 0.1) +
  ggtitle("PL Mtb CFUs D21 Lung")

analyzed_CFUs <- analyzed_CFUs %>%
  mutate(group = as.factor(group))

# scatter plot
ggscatter(x = "group", y = "log_CFUs", color = "group", data = analyzed_CFUs) +
  stat_pvalue_manual(data = sig2, label = "adj.p.value", 
                     hide.ns = TRUE, y.position = ypos, step.increase = 0.1) +
  ggtitle("PL Mtb CFUs D21 Lung")

Alternative use calculating average of all CFUs

If you want to see the consistancy of the plating across the dilutions, you can skip the pick_one_dilution function and calcuate the average CFUs for each group/replicate. Typically this is done by only calculating CFUs for which there is a count (removing CFU rows that are either TNTC or 0).

Note: The way that I'm doing this now doesn't account for the fact that if a group/replicate has 0 CFUs across the dataset, it will just be removed since I'm filtering CFUs != 0

analyzed_CFUs_wo_pick_one <- read_xlsx(example_file_address) %>%
  tidy_CFU() %>%
  filter(organ == "Spleen") %>%
  calculate_cfu(5, 0.5, 100, .5, "dilution", "CFUs") %>%
  filter(CFUs != 0) %>%
  group_by(group, replicate) %>%
  mutate(average_log_CFUs = mean(log_CFUs))

Group Names

group <- c(1:10)
route <- c(rep("I.P.", 5), rep("Oral", 5))
drug <- c(rep(c("control", "control", "Drug 1", "Drug 2", "Drug 1 + Drug 2"), 2))

vaccination <- c(rep(c("PBS", rep("BCG", 4)), 2))

group_df <- data.frame(group, route, drug, vaccination)
group_df