require(pander)
require(BiomassWorkflow)
require(magrittr)

Getting Set Up

This document is meant to serve as an introductory primer for using R to streamline and standardize the generation and analysis of microbial density data.

To follow along, you will need to download and load the package (I'm also loading the magrittr package to make the code more readable with pipes:

# Install and load the devtools package to help download the BiomassWorkflow package
install.packages('devtools', repos='http://cran.us.r-project.org')
require(devtools)

# Install and load the BiomassWorkflow package
devtools::install_bitbucket('econtijoch/biomass-workflow')
require(BiomassWorkflow)
require(magrittr)

Now you should be able to get working. If those instructions cause any errors or warnings, the best way to overcome them is to use google to search for answers - links to stackoverflow.com are always a good place to start.

Sample Data

If you don't have data ready to work with, you can use the example data that comes with the package to get started or to model your work. This can be done with the sample_file_maker() function. Briefly, it will download example raw files for use with this (and other) tutorials. You can specify a directory to write the output into.

# Get and store path to working directory
working_directory <- getwd()
# Generate sample files to this directory
sample_file_maker(directory = working_directory)

This will create several files in your directory:

For this example, 96 samples were processed with the DNase Inactivation Buffer extraction protocol in the Matrix tubes (1 mL). 200 uL of the cell lysate was used for the DNA isolation (200 of 700 uL total, so the scaling factor here is 3.5x). The samples were measured with both the Broad Range and High Sensitivity qubit dyes with 2 uL of sample in 198 uL of buffer, and standards were measured using 10 uL of standard in 190 uL of buffer.

Prep - Calculating Sample Masses

Now that we have data to work with, we can put this package to work.

The first step in this analysis is to calculate sample masses from the weight data of our empty and full tubes. Additionally, we can optionally use the order of the tubes from the 96-well plate scanner to align the tubes in the order used in the extraction.

The function mass_and_order() takes care of both of these functions:

sample_masses <- mass_and_order('Sample_Empty_Weights.txt', 'Sample_Full_Weights.txt', order = 'Sample_Tube_Order_Scanned.csv')


head(sample_masses)

Prep - Mapping File

We can use this data to manually create a mapping file (by saving this output as a .csv file, and then manipulating it with excel - making sure to save the result as a .csv file!), or we can do much of that work in R, to leverage some of the data processing capabilities of R.

It is important to remember that a valid mapping file has a few NECESSARY columns:

The mass_and_order() function will already give us a table with the BarcodeID and the SampleMass, so all we need to do is add a couple of columns, and join this data with some information about the specific samples, and we should have a complete mapping file.

Lets take a look at what's in the example file called Sample_SampleInfo.csv:

sample_info <- readr::read_csv('Sample_SampleInfo.csv')

head(sample_info)

That looks like it'll help put together a good mapping file, so let's start building it:

mapping_file <- dplyr::left_join(sample_masses, sample_info) %>% # Merge sample info and masses
  dplyr::mutate(Type = "Experiment", ReaderWell = SampleWell) %>% # Add our necessary missing components
  dplyr::select(ReaderWell, Type, BarcodeID, SampleMass, everything()) %>%  # This last bit re-orders the mappign file
  dplyr::filter(!is.na(BarcodeID)) # remove wells that didn't have samples (two in this case)

head(mapping_file)

We can save this as our mapping file and make additional changes manually (often times it is easier to do this).

readr::write_csv(mapping_file, 'Sample_Mapping_made_in_R.csv')

In any case, we are now ready to proceed to reading in the data from the plate reader and computing microbial density.

Generating Microbial Density Data

The main function to perform analyses will be DataAnalysis(). It reads in your raw data and mapping file and produes a table that will contain a host of useful data for downstream applications. These include:

The function has several arguments that will be used often:

There are some other arguments, but they are less important, and you can see the man file for more details.

In our case, we have our samples all on one plate, and our standards on another plate. We have also measured our samples with both the Broad-Range and High-Sensitivity dyes, (and measured the standards on the same plate). Let's make some data:

BR_data <- DataAnalysis(plate_reader_file = 'Sample_BR_raw.xlsx', mapping_csv_file = 'Sample_Mapping.csv', standards_plate_reader_file = 'Sample_Standards_raw.xlsx', standards_mapping_csv_file = 'Sample_BR_Standards_Mapping.csv', volume = 2, scale = 3.5, exp_id = "Sample Experiment - BR")

HS_data <- DataAnalysis(plate_reader_file = 'Sample_HS_raw.xlsx', mapping_csv_file = 'Sample_Mapping.csv', standards_plate_reader_file = 'Sample_Standards_raw.xlsx', standards_mapping_csv_file = 'Sample_HS_Standards_Mapping.csv', volume = 2, scale = 3.5, exp_id = "Sample Experiment - HS")

The output of this function is a list object that contains two things:

First, we can take a peek at both of our standard curves:

cowplot::plot_grid(HS_data$standards_plot, BR_data$standards_plot, nrow = 1)

Not bad!

Since we used both qubit dyes to measure the DNA concentration, we can get a 'consensus' measurement by combining the measurements (using the HS for low concentrations, BR for high concentrations, and the average of the two for values that lie in the range of both)

consensus_data <- qubit_merger(HS_data$data, BR_data$data)

Now, we can be pretty confident that we've measured our DNA as accurately as possible, so lets take a look at our data, and play around with it visually. The plotter() function launches a shiny app that allows you to toggle the x and y variables of a plot interactively, and is a good way of giving your data a first pass.

View(consensus_data)

plotter(consensus_data)

That's it! Now you can go from raw data to microbial density data in no time. The next steps will be getting samples ready for sequencing (16S or metagenomics), which will be handled below.

Preparing Samples for 16S Sequencing

Preparing for Sequencing

This package also contains a wrapper function that is meant to take data from one or more experiments and generate the files necessary to plan and create a 16S sequencing run (or runs).

The function used to prep samples for 16S sequencing is thesequencing_prep_16S() function, which takes a list of data, and generates files in the working directory that correspond to entire sequencing runs. You need to provide the number of barcode plates available (e.g. if you have 192 sequencing barcodes, this would be 2 plates. if you have 480 barcodes, this would be 5, etc..).

Let's say that we have some other data that we want to include in our sequencing run. We'll assume it was generated in the same way as above. Let's read it into our session. [Side note here: I have dropped one column from this data because it will go on to give us trouble later on. This is because it has a lot of NA values which get read as a character, and when we try to join it with the other data, which has NAs treated as logical values, we get an error. So to get rid of that problem, I've just dropped the column here]

other_data <- readr::read_csv('Sample_Other_Data.csv') %>% dplyr::select(-Other)

If your data has been generated using the package, then preparing your samples for sequencing should be as easy as creating a list of relevant samples and then running the sequencing prep command.

data_list <- list(consensus_data, other_data)

sequencing_runs_16S <- sequencing_prep_16S(experiment_data_list = data_list, n_barcode_plates = 5)

This function call may throw a warning if you don't pre-filter your data to exclude samples that will have less than the necessary DNA in them to move forward with sequencing. You can change this default action by passing along filter = FALSE in the function call.

-This function will indicate which plates are needed (it is important to label your DNA plates with PlateIDs in your mapping file), how many samples will be included in the sequencing run(s), how many sequencing runs will be needed to sequence all samples, and the number of unused barcodes that will remain.

-The output will be organized into folders based on sequencing runs, which each will contain a file for using the Beckmann robot to performa a dilution of each plate in the sequencing run, and a QIIME-ready mapping file. By default, the sequencing mapping file is created using the Faith Lab 16S sequencing barcodes. If you have a different set of barcodes, you will need to pass in a .csv file that contains the correct.

-You can indicate the directory in which to place these files by passing a directory as output_directory = '/path/to/directory/'.

-Within R, the output of these functions will be a list of lists, containing all of the data that is written to the output directory.



econtijoch/Biomass-Workflow documentation built on May 15, 2019, 7:59 p.m.