In econtijoch/Biomass-Workflow: Microbial Density Data Generation and Analysis Tools

require(pander)
require(BiomassWorkflow)
require(magrittr)

Getting Set Up

This document is meant to serve as an introductory primer for using R to streamline and standardize the generation and analysis of microbial density data.

To follow along, you will need to download and load the package (I'm also loading the magrittr package to make the code more readable with pipes:

# Install and load the devtools package to help download the BiomassWorkflow package
install.packages('devtools', repos='http://cran.us.r-project.org')
require(devtools)

# Install and load the BiomassWorkflow package
devtools::install_bitbucket('econtijoch/biomass-workflow')
require(BiomassWorkflow)
require(magrittr)

Now you should be able to get working. If those instructions cause any errors or warnings, the best way to overcome them is to use google to search for answers - links to stackoverflow.com are always a good place to start.

Sample Data

If you don't have data ready to work with, you can use the example data that comes with the package to get started or to model your work. This can be done with the sample_file_maker() function. Briefly, it will download example raw files for use with this (and other) tutorials. You can specify a directory to write the output into.

# Get and store path to working directory
working_directory <- getwd()
# Generate sample files to this directory
sample_file_maker(directory = working_directory)

This will create several files in your directory:

Sample_BR_Standards_Mapping.csv - A mapping file for the Broad-Range standards, which were included on the same plate as the High-Sensitivity standards, but a different plate than the samples.
Sample_HS_Standards_Mapping.csv - A mapping file for the High-Sensitivity standards, which were included on the same plate as the Broad-Range standards, but a different plate than the samples.
Sample_Mapping.csv - A mapping file that can be used directly in the DataAnalysis function (see below)
Sample_SampleInfo.csv - A file containing information about the samples, that can be used to make a mapping file from scratch
Sample_Tube_Order_Scanned.csv - A file of tube locations from the Cho Lab Matrix 96-barcode scanner
Sample_BR_raw.xls - An excel file from the plate-reader for the sample plate run with the Broad-Range Dye
Sample_HS_raw.xls - An excel file from the plate-reader for the sample plate run with the High-Sensitivity Dye
Sample_Standards_raw.xls - An excel file from the plate-reader for the standards plate (contains both BR and HS standards)
Sample_Empty_Weights.txt - A text file containing the empty weight information for the tubes. Contains the BarcodeID's (from the LabelMaker Spreadsheet), tube barcode (from bottom of matrix tubes), which will both be helpful in constructing a mapping file.
Sample_Full_Weights.txt - A text file containing the full weight information for the tubes. Contains the tube barcode (from bottom of matrix tubes), which will both be helpful in constructing a mapping file by tying this data back with the empty weights.

For this example, 96 samples were processed with the DNase Inactivation Buffer extraction protocol in the Matrix tubes (1 mL). 200 uL of the cell lysate was used for the DNA isolation (200 of 700 uL total, so the scaling factor here is 3.5x). The samples were measured with both the Broad Range and High Sensitivity qubit dyes with 2 uL of sample in 198 uL of buffer, and standards were measured using 10 uL of standard in 190 uL of buffer.

Prep - Calculating Sample Masses

Now that we have data to work with, we can put this package to work.

The first step in this analysis is to calculate sample masses from the weight data of our empty and full tubes. Additionally, we can optionally use the order of the tubes from the 96-well plate scanner to align the tubes in the order used in the extraction.

The function mass_and_order() takes care of both of these functions:

sample_masses <- mass_and_order('Sample_Empty_Weights.txt', 'Sample_Full_Weights.txt', order = 'Sample_Tube_Order_Scanned.csv')


head(sample_masses)

Prep - Mapping File

We can use this data to manually create a mapping file (by saving this output as a .csv file, and then manipulating it with excel - making sure to save the result as a .csv file!), or we can do much of that work in R, to leverage some of the data processing capabilities of R.

It is important to remember that a valid mapping file has a few NECESSARY columns:

ReaderWell - This is necessary to be able to connect the data from the plate reader to the data in your mapping file. The ReaderWell column contains which well a sample came from, and it is important that the wells be in 2-digit format (e.g. A01 and not A1).
Type - This column designates the type of sample in the well. This is either going to be "Experiment" or "Standard" to designate an experimental sample or a standard, respectively.
SampleMass - This is necessary for the calculation of microbial density in your experimental samples. If you need to just get a DNA concentration, you can put any placeholder here (suggested placeholder is 1). For Standard samples, the SampleMass column is where you include the standard value. For HS dye, this will be {0, 5, 10, 20, 40, 60, 80, 100}, and for the BR dye, this will be {0, 50, 100, 200, 400, 600, 800, 1000} (this assumes 10 uL of standard was loaded into the plate)
BarcodeID - This is a unique identifier for the sample. This is typically taken from the BarcodeID generated from the LabelMaker spreadsheet, but it is not necessary for it to be. If you don't want to spend much time making unique sample identifiers, you can simply use {Sample001, Sample002, Sample003, ...}

The mass_and_order() function will already give us a table with the BarcodeID and the SampleMass, so all we need to do is add a couple of columns, and join this data with some information about the specific samples, and we should have a complete mapping file.

Lets take a look at what's in the example file called Sample_SampleInfo.csv:

sample_info <- readr::read_csv('Sample_SampleInfo.csv')

head(sample_info)

That looks like it'll help put together a good mapping file, so let's start building it:

mapping_file <- dplyr::left_join(sample_masses, sample_info) %>% # Merge sample info and masses
  dplyr::mutate(Type = "Experiment", ReaderWell = SampleWell) %>% # Add our necessary missing components
  dplyr::select(ReaderWell, Type, BarcodeID, SampleMass, everything()) %>%  # This last bit re-orders the mappign file
  dplyr::filter(!is.na(BarcodeID)) # remove wells that didn't have samples (two in this case)

head(mapping_file)

We can save this as our mapping file and make additional changes manually (often times it is easier to do this).

readr::write_csv(mapping_file, 'Sample_Mapping_made_in_R.csv')

In any case, we are now ready to proceed to reading in the data from the plate reader and computing microbial density.

Generating Microbial Density Data

The main function to perform analyses will be DataAnalysis(). It reads in your raw data and mapping file and produes a table that will contain a host of useful data for downstream applications. These include:

DNA concentration - we construct a standard curve based on the given standard samples and values, and use this to compute DNA concnetrations for each sample
Microbial Density - using the DNA concentration, our scaling factors and sample mass, we compute the sample biomass (ug DNA per mg sample).
16S PCR prep - Identifies if there is enough DNA for performing 16S sequencing, and computes the DNA and water volumes necessary to acheive a dilution to 2 ng/uL (the working conentration of template for the PCR reaction).
Metagenomics prep - Identifies if there is enough DNA for performing metagenomic shotgun sequencing, and computes the DNA and water volumes necessary to acheive a dilution to 20 ng/uL (into 25 uL -- 20 uL is needed for the metagenomics library prep protocol).

The function has several arguments that will be used often:

plate_reader_file - this is a string of the file path of the output file from the plate reader for the plate containing your samples. This function accepts excel files or .csv files, but if using an excel file, it will only read the first sheet of the file, so it is best to separate out measurements of several plates across multiple files.
mapping_csv_file - this is a string of the file path of the mapping file for your samples (that conforms with the requirements outlined above).
standards_plate_reader_file - a string of the file path of the output file from the plate reader that contains the standard curve. This defaults to the same file as your samples, but often times, standards can be on a different plate, so here is were you can include that information.
standards_mapping_csv_file - as with the plate reader file, this is the path to a mapping file for the wells/plate containing the standard curve. It also defaults to the mapping file for your samples, but you an indicate a separate plate here.
volume - volume used in measuring DNA concentration by qubit (default = 2)
scale - scaling factor used to account for subsampling in DNA extraction protocol (default = 3.5)

There are some other arguments, but they are less important, and you can see the man file for more details.

In our case, we have our samples all on one plate, and our standards on another plate. We have also measured our samples with both the Broad-Range and High-Sensitivity dyes, (and measured the standards on the same plate). Let's make some data:

BR_data <- DataAnalysis(plate_reader_file = 'Sample_BR_raw.xlsx', mapping_csv_file = 'Sample_Mapping.csv', standards_plate_reader_file = 'Sample_Standards_raw.xlsx', standards_mapping_csv_file = 'Sample_BR_Standards_Mapping.csv', volume = 2, scale = 3.5, exp_id = "Sample Experiment - BR")

HS_data <- DataAnalysis(plate_reader_file = 'Sample_HS_raw.xlsx', mapping_csv_file = 'Sample_Mapping.csv', standards_plate_reader_file = 'Sample_Standards_raw.xlsx', standards_mapping_csv_file = 'Sample_HS_Standards_Mapping.csv', volume = 2, scale = 3.5, exp_id = "Sample Experiment - HS")

The output of this function is a list object that contains two things:

The resulting data in a data frame (accessed using $data)
A plot of the standard curve used to make DNA concentration measurements (accessed using $standards_plot)

First, we can take a peek at both of our standard curves:

cowplot::plot_grid(HS_data$standards_plot, BR_data$standards_plot, nrow = 1)

Not bad!

Since we used both qubit dyes to measure the DNA concentration, we can get a 'consensus' measurement by combining the measurements (using the HS for low concentrations, BR for high concentrations, and the average of the two for values that lie in the range of both)

consensus_data <- qubit_merger(HS_data$data, BR_data$data)

Now, we can be pretty confident that we've measured our DNA as accurately as possible, so lets take a look at our data, and play around with it visually. The plotter() function launches a shiny app that allows you to toggle the x and y variables of a plot interactively, and is a good way of giving your data a first pass.

View(consensus_data)

plotter(consensus_data)

That's it! Now you can go from raw data to microbial density data in no time. The next steps will be getting samples ready for sequencing (16S or metagenomics), which will be handled below.

Preparing Samples for 16S Sequencing

Preparing for Sequencing

This package also contains a wrapper function that is meant to take data from one or more experiments and generate the files necessary to plan and create a 16S sequencing run (or runs).

The function used to prep samples for 16S sequencing is thesequencing_prep_16S() function, which takes a list of data, and generates files in the working directory that correspond to entire sequencing runs. You need to provide the number of barcode plates available (e.g. if you have 192 sequencing barcodes, this would be 2 plates. if you have 480 barcodes, this would be 5, etc..).

Let's say that we have some other data that we want to include in our sequencing run. We'll assume it was generated in the same way as above. Let's read it into our session. [Side note here: I have dropped one column from this data because it will go on to give us trouble later on. This is because it has a lot of NA values which get read as a character, and when we try to join it with the other data, which has NAs treated as logical values, we get an error. So to get rid of that problem, I've just dropped the column here]

other_data <- readr::read_csv('Sample_Other_Data.csv') %>% dplyr::select(-Other)

If your data has been generated using the package, then preparing your samples for sequencing should be as easy as creating a list of relevant samples and then running the sequencing prep command.

data_list <- list(consensus_data, other_data)

sequencing_runs_16S <- sequencing_prep_16S(experiment_data_list = data_list, n_barcode_plates = 5)

This function call may throw a warning if you don't pre-filter your data to exclude samples that will have less than the necessary DNA in them to move forward with sequencing. You can change this default action by passing along `filter = FALSE` in the function call.

-This function will indicate which plates are needed (it is important to label your DNA plates with PlateIDs in your mapping file), how many samples will be included in the sequencing run(s), how many sequencing runs will be needed to sequence all samples, and the number of unused barcodes that will remain.

-The output will be organized into folders based on sequencing runs, which each will contain a file for using the Beckmann robot to performa a dilution of each plate in the sequencing run, and a QIIME-ready mapping file. By default, the sequencing mapping file is created using the Faith Lab 16S sequencing barcodes. If you have a different set of barcodes, you will need to pass in a .csv file that contains the correct.

-You can indicate the directory in which to place these files by passing a directory as output_directory = '/path/to/directory/'.

-Within R, the output of these functions will be a list of lists, containing all of the data that is written to the output directory.

econtijoch/Biomass-Workflow documentation built on May 15, 2019, 7:59 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

econtijoch/Biomass-Workflow
Microbial Density Data Generation and Analysis Tools

In econtijoch/Biomass-Workflow: Microbial Density Data Generation and Analysis Tools

Getting Set Up

Sample Data

Prep - Calculating Sample Masses

Prep - Mapping File

Generating Microbial Density Data

Preparing Samples for 16S Sequencing

Preparing for Sequencing

This function call may throw a warning if you don't pre-filter your data to exclude samples that will have less than the necessary DNA in them to move forward with sequencing. You can change this default action by passing along `filter = FALSE` in the function call.

-Within R, the output of these functions will be a list of lists, containing all of the data that is written to the output directory.

R Package Documentation

Browse R Packages

We want your feedback!

econtijoch/Biomass-Workflow Microbial Density Data Generation and Analysis Tools

In econtijoch/Biomass-Workflow: Microbial Density Data Generation and Analysis Tools

Getting Set Up

Sample Data

Prep - Calculating Sample Masses

Prep - Mapping File

Generating Microbial Density Data

Preparing Samples for 16S Sequencing

Preparing for Sequencing

This function call may throw a warning if you don't pre-filter your data to exclude samples that will have less than the necessary DNA in them to move forward with sequencing. You can change this default action by passing along filter = FALSE in the function call.

-Within R, the output of these functions will be a list of lists, containing all of the data that is written to the output directory.

R Package Documentation

Browse R Packages

We want your feedback!

econtijoch/Biomass-Workflow
Microbial Density Data Generation and Analysis Tools

This function call may throw a warning if you don't pre-filter your data to exclude samples that will have less than the necessary DNA in them to move forward with sequencing. You can change this default action by passing along `filter = FALSE` in the function call.