In jdidion/labmath: Helpers for creating laboratory protocols.

Overview

This vignette shows how to use the labmath package to create reagents and then use them in a protocol.

A protocol has three parts:

Materials table: All of the raw materials you need to execute the program. These may include ingredients for reagents you will make, pre-made reagents, or supplies (e.g. glassware or lab equipment). At minimum, every material needs a name. It is also good practice to provide the molecular weight of solid ingredients and the molarity of liquid ingredients and solutions. Beyond that, you can generally include whatever other information you want in the materials table, e.g. vendor, cost, location in the lab.
Solutions: Recipes for the solutions you need to make for the protocol. Note that if you specify an ingredient in a solution that is not already in the materials list, it will be automatically added. If a solution requires involved preparation (i.e. beyond just "mix all these things together"), you should consider developing a separate protocol for creating that solution.
Protocol steps: Step-by-step instructions for executing the protocol. These steps can include special instructions (macros) which are described in detail later in this vignette.

Note that we use the terms solution and mixture to mean two slightly different things. A solution is a recipe in which units may be specified in relative terms (e.g. percentages or molarity). A mixture is a specific instance of a solution for a specific total volume. The solvent in a solution is always assumed to be water unless otherwise specified.

The Materials List

Ultimately, the materials list needs to be provided as an R data.frame or data.table. However, you have some flexibility in how you create your materials list. You could make an excel spreadsheet and then load it into R using the readxl package:

materials <- readxl::read_excel("my-materials.xlsx")

You could create your own data table in R (although this is less intuitive since you have to give the information by column):

materials <- data.table(
    name=c("NaCl", "HCl"),
    vendor=c("ABC Chemicals", "ChemX"),
    mw=c(28, 18)
)

Once you have the table loaded in R, you can save it to a file and then load it any time you need to use it:

save(materials, file="materials.RData")
load("materials.RData")

One of the easiest ways to create your materials list is with the materials function, which lets you write it out in plain text:

materials("
    name ; vendor        ; mw
    NaCl ; ABC Chemicals ; 28
    HCl  ; ChemX         ; 18  
")

The first row gives the column names, separated by semicolons. You are free to include any columns you want, but the material name always has to be in a name column, and the molecular weight has to be in a mw column. You can use whitespace however you want; all leading and trailing whitespace in each field is trimmed off.

The finaly way to specify materials is to do so within your solutions and/or protocol steps. You do this using the material macro. This will be covered more in depth later, but to give you a preview of how macros work, let"s say you have a step that requires stirring a mixture using a stir bar. You could include in your protocol the following:

In a ${50 ml beaker ; GlassCo}, combine ingredients. Place a ${stir bar ; SitrMaster} in the beaker. Place beaker on a ${magnetic stirrer} and stir for 10 min.

The bits that start with "\${" and end with "}" are macros - in this case, material macros. There are different types of macros, each of which begins with a different character (i.e. in place of the "\$"). The material macro lets you provide the same information as you did using the materials function, but doing it this way automatically creates your material list at the same time you"re creating your protocol. Note that you still need to call the materials function to provide the column names for the table. Alternatively, you can use keys in all of your material macros, like so:

In a ${50 ml beaker ; vendor=GlassCo} add 5 g ${HCl ; vendor=ChemX ; mw=18}

The first bit of information is always assumed to be the name, so you don't need to explicitly write "name=" (although you could if you wanted to). If you always use keys, then the columns in your materials table will be figured out automatically and you don't have to call materials at all.

Solutions

As mentioned above, a solution is a recipe for creating a reagent. The ingredients for a solution are generally specified in relative units, such as "%vv" (percentage volume/volume) or "mol" (molarity). Ingredient amounts can also be specified in the more general form of " / ", e.g. "1 g/50 ml". Finally, there are some reagents provided in "indivisible" units (e.g. tablets) that are specified in terms of unit/volume. For example, "1 tablet/L". In this case, "tablet" is not a recognized unit, so a mixture made from this recipe will simply include one tablet for each liter (full or partial). Sometimes, not even a volume is required - it is simply intended to use one unit no matter the volume of your reagent.

A solution is specified as a list of ingredients. Each ingredient has the form "amount unit name".

Amount: Must be one of the following:
- a real number, using a decimal point to separate the integral and fractional parts (if any) and without spaces, commas, or any other delmiters
- a fraction, e.g. 1/4
- a ratio, e.g. 1:4, taken as amount of solute per amount of solvent (so 1:4 = 1/(1+4) = 0.2).
Unit: Any of the following:
- %ww : percentage of solute as a fraction of total solution weight
- %wv (or just %) : percentage of solute as a fraction of total volume
- %vv : percentage of a liquid solute as a fraction of final solution volume
- mol : molarity (moles/L). kmol, mmol, and umol are also recognized
- unit[/vol unit] : absolute quantity, or quantity per volume. The unit can be anything. If it is a recognized SI unit, then an exact amount will be calculated when making a mixture, otherwise it will be assumed to be a non-divisible entity.
Name: The name of the ingredient. Note that this can also be a material macro, in which the molecular weight can be specified. e.g. "5 mmol ${NaCl ; 28}" means "5 millimolar NaCl, which has a molecular weight of 28 g/mol." If a molecular weight is not provided, then final amounts will be given in moles when creating a mixture.

Each solution also has a name and, optionally, a description. You create a solution with the solution function:

lysis.buffer <- solution("Lysis buffer", "
    10 mmol ${Tris ; ABC Chemical ; 121.14}
    10 mmol ${NaCl ; ChemX        ; 28}
    0.2 % ${Igepal ; Bob's Chemicals}
    1 unit protease inhibitor
", 
"Buffer used during cell lysis")

In this solution, we specified two ingredients (Tris and NaCl) in terms of molarity and provided molecular weights for each. We also specfied one reagent (Igepal) in terms of % weight/volume and didn't provide a molecular weight. Finally, we specified a reagent (protease inhibitor) with an absolute amount (1 unit). No matter how much lysis buffer we make, we'll always use just one unit of protease inhibitor. Since we didn't use a materal macro for the protease inhibitor, "protease inhibitor" is taken to be the name, without any vendor or molecular weight information supplied.

Steps

You define the steps of your protocol using the steps function. These are written in (mostly) plain english, although as we previously hinted at, there are some macros that make life easier. You are free to break up your protocol into a multiple sets of steps (each set defined by a call to steps) or just write the whole protocol as one set of steps. Two ways we like to write protocols are: 1) for multi-day protocols, make each day a set of steps; or 2) create a seperate set of steps for each discrete part of the protocols, i.e. each set of steps that ends with a potential stopping point.

Here's an example set of steps for lysing cells using the lysis buffer we made above:

steps("Cell lysis", "
    1.  Cool #{10 ml OF Lysis buffer IN Tube 1} to 4%{C}.
    2.  Prepare ${liquid nitrogen or dry ice/ethanol bath}.
        !{WARN: Liquid nitrogen is dangerous. Use extreme caution when handling.}
    3.  Wash cells with ${PBS}.
    4.  Resuspend cells in #{Lysis buffer FROM Tube 1} and incubate 
        at 4%{C} for @{10 min}.
    5.  Snap freeze cells and place immediately in -80%{C} freezer.
",
"Use lysis buffer to lyse cells and then freeze them indefinitely.")

The first argument to steps is a unique name for the set of steps. You can make this a descriptive name, something to indicate timing (e.g. "Day 1"), or whatever you want. The next argument is the set of steps. Again, you can use whitespace however you want. You can even break each step across multiple lines, as we did in step 4. The only requirement is that you begin each step with "N. ", where N is a number, and there are at least two spaces after the period. Finally, you can provide a more detailed description.

You'll notice a couple of new macros we use here. Most importantly, there is the reagent macro, which specifies that you're using a specific amount of a reagent ("10 ml OF Lysis buffer") and that you are preparing it in a specific vessel ("Tube 1"). The reason for this formal way of describing reagent usage is that the program will automatically keep track of how much of each thing you are using, which vessles contain which reagents at each step, and what you'll end up with at the end. Note that we used a material macro rather than a reagent macro for PBS ("${PBS}") because we assume we'll have lots of PBS available and don't need to worry about measuring it out into a separate container. You are free to use a reagent macro instead (e.g. "#{20 ml OF PBS}"), in which case the program will keep track of how much PBS you need. Also notice that in step 4 we specify that we're transfering the Lysis buffer from Tube 1. If we wanted to, we could also say where we're transferring it to, e.g. "#{Lysis buffer FROM Tube 1 TO Tube 2}." We also didn't specify an amount, so it is just assumed that the full amount is transferred.

Another new macro is the timer macro, which starts with "@". A timer specifies that a step needs to be carried out over a certain time period. The program will keep track of the total time for each set of steps, and that information can be used for other purposes down the road, such as schedule planning.

Next, you'll notice in step 2 that we included a warning about liquid notrogen within a message macro (starting with "!"). There are three types of macros: NOTE, WARN, and ERROR. When formatting your protocol, these messages will be printed in call-out boxes to draw attention to them.

Finally, we use an escape macro (starting with "%") to insert a commonly used expression (ºC) that would otherwise require you to go through some tedious steps to insert the degree symbol. You'll see later that there are other pre-defined escape macros, and that you can define your own escape macros for expressions that you use commonly.

For greater detail on using all the different macros, see the documentation for parse.macro.

Protocol

Now that we have all the pieces, we can put it together into a protocol. There are two ways to create a protocol: implicitly and explicitly. You can create a protocol implicilty just by calling the functions we described above. There is a single Protocol object defined in the global environment, and every time you call materials, solution or steps, the corresponding object gets added to the global protocol. This global protocol is stored in the .protocol variable, or you can access it by calling the protocol function without any arguments.

chip.seq <- protocol()

You can also create a protocol explicitly by calling the protocol function:

chip.seq <- protocol("ChIP-Seq",
    "The Smith Lab's standard ChIP-Seq protcol",
    materials=materials("
        name ; vendor        ; mw
        NaCl ; ABC Chemicals ; 28
        HCl  ; ChemX         ; 18  
    "),
    solutions=list(
        lysis.buffer <- solution("Lysis buffer", "
            10 mmol ${Tris ; ABC Chemical ; 121.14}
            10 mmol ${NaCl ; ChemX        ; 28}
            0.2 % ${Igepal ; Bob's Chemicals}
            1 unit protease inhibitor
        ", 
        "Buffer used during cell lysis")
    ),
    steps=list(
        steps("Cell lysis", "
            1.  Cool #{10 ml OF Lysis buffer IN Tube 1} to 4%{C}.
            2.  Prepare ${liquid nitrogen or dry ice/ethanol bath}.
                !{WARN: Liquid nitrogen is dangerous. Use extreme caution when handling.}
            3.  Wash cells with ${PBS}.
            4.  Resuspend cells in #{Lysis buffer FROM Tube 1} and incubate 
                at 4%{C} for @{10 min}.
            5.  Snap freeze cells and place immediately in -80%{C} freezer.
        ",
        "Use lysis buffer to lyse cells and then freeze them indefinitely.")
    )
)

The chip.seq variable now contains a Protocol object, which contains all the details of our protocol. If you want, you can save this to an R data file and load it later.

save(chip.seq, file="Chip-Seq-protocol.RData")

Working with Protocols

Now that we have a protocol, what can we do with it? Well, the obvious first thing is to print it out in a readable format. The labmath package comes with a default template for formatting a protocol. To use this template:

Go to File > New File > R Markdown
Select "From Template"
Choose either "Protocol (PDF)" or "Protocol (HTML)", depending on whether you want a PDF or HTML document.
Define your protocol in a code block (see the previous section), or load it from an R data file.
Print the protocol, using the print function:

print(chip.seq)

If you don't care about saving your protocol or referring to it later, you could combine steps 4 and 5 just by not assigning the protocol to a variable (i.e. omit "chip.seq <- ").

In the future, there will be several other cool things you can do with protocols, such as printing them in different layouts, exporting them to open formats (such as XPDL), and making weekly schedules for your time at the bench. Stay tuned!

Variables

There is one other feature of protocols that we haven't covered yet that is important for writing general-purpose protocols. For example, let's say you want to develop a general ChIP-Seq protocol that can be used with any transcription factor. Rather than explicitly write the name of the antibody in your protocol, you can instead use a variable:

10 mol ?antibody

We can then create an instance of a protocol and can then supply the actual value of this variable:

print(protocol, variables=list(antibody="CTCF"))

You'll notice that we've used a variant of the print function that takes a named list of values to fill in the variables in the protocol.

If you anticipate that the molarity requirement of each antibody will be different, you can use two variables:

?antibody.concentration ?antibody.name

print(protocol, variables=list(antibody.concentration="15 mol", antibody.name="CTCF"))

A third option is to just specify the entire macro as a variable. In this case, you use a variable macro:

?#{antibody}

print(protocol, variables=list(antibody="15 mol CTCF"))

You can use variable macros for any of the other macro types in the same way; e.g. "?@{time}" defines a timer variable macro.

Scaling

Lets say that you want to scale up your protocol for multiple samples. There are two ways to do this. First, you could use the scale parameter of the print function:

print(chip.seq, scale=5)

This would mulitply all the reagent amounts x 5. The second way is to use multiple variables. For example, let's say we want to do ChIP-Seq on three samples using two different antibodies:

print(protocol, variables=list(sample=c("A","B","C"), antibody=c("CTCF","PDX1")))

Since you will be doing 6 total experiments (3 samples x 2 antibodies), the materials list and reagent amounts will be multiplied by 6.

Experiments

An experiment is simply a collection of protocols that you want to perform. For example, we could break our ChIP-Seq experiment into crosslinking, lysis, and immunoprecipitation protocols, and then group them together in an experiment. There will be one materials list for the entire experiment that combines the materials required for all of the protocols. Experiments also take care of scaling.

Experiment("CTCF and PDX1 ChIP-Seq for samples A, B, and C",
    "Perform ChIP-Seq for two TFs on three samples",
    protocols=list(
        crosslinking,
        lysis,
        immunoprecipitation
    ),
    variables=list(
        samples=c("A","B","C"),
        antibodies=c("CTCF", "PDX1")
    )
)