Omixer: multivariate and reproducible randomization
In Omixer: Randomize Samples for -omics Profiling

knitr::opts_chunk$set(collapse=TRUE, comment="#>")
set.seed(123)

Introduction

Batch effects can have a major impact on the results of omics studies [@Leek2010]. Randomization is the first, and arguably most influential, step in handling them. However, its implementation suffers from a few key issues:

A single random draw can inadvertently result in high correlation between technical covariates and biological factors. Particularly in studies with large numbers of batches and outcomes of interest, minimizing these correlations is crucial.
Long, randomized sample lists are unintuitive and translate poorly into any wet lab that is not fully automated. This can result in errors and sample mixups.
The randomization process is inherently unclear in many publications, rarely described despite the varying efficacy of methods.
Randomized layouts are not always reproducible, resulting in inconsistent results.

To combat these problems, we developed Omixer - an R package for multivariate randomization and reproducible generation of intuitive sample layouts.

Dependencies

This document has the following dependencies.

library(Omixer)
library(tibble)
library(forcats)
library(stringr)
library(dplyr)
library(ggplot2)
library(magick)

Workflow

Omixer randomizes input sample lists multiple times (default: 1,000) and then combines these randomized lists with plate layouts, which can be selected from commonly used setups or custom-made. It can handle paired samples, keeping these adjacent but shuffling their order, and allows explicit masking of specific wells if, for example, plates are shared between different studies.

After performing robust tests of correlation between technical covariates and selected randomization factors, a layout is chosen using these criteria:

No test provided sufficient evidence to suggest correlation between the variables (all p-values over 0.05).
From the remaining layouts, return one where the absolute sum of correlations is minimized.

The optimal randomized list can then be processed by omixerSheet, returning intuitive sample layouts for the wet lab.

Creating Layouts

In order to establish correlations between technical covariates and biological factors, Omixer needs to know the plate layout that your samples will be randomized to. There are several options for automatically creating some common layouts. Alternatively, a data frame can be input to the layout option alongside specified techVars. Possibilities are discussed in more detail below.

Automated Layouts

Several options can be used to automatically generate common layouts:

wells specifies the number of wells on a plate, which can be 96, 48, or 24.
plateNum determines how many copies of the plate your samples will need.
div is optional, and subdivides the plate into batches. This can be used to specify chips within a plate, for example.
positional allows positions within div to also be treated as batches. This is useful for 450K experiments where positional batch effects have been identified [@Jiao2018].

Subdivisions

By default, div is set to "none", but it can be set to "col", "row", "col-block", or "row-block".

col treats each column in the plate as a batch
row treats each row in the plate as a batch
col-block will separate the plate into batches that are 2 columns wide
row-block separates the plate into 2 row wide batches

So, for wells=48, div="col", each column of a 48-well plate will be treated as a batch (different colours in the image below).

knitr::include_graphics('./omixer_48col.png')

If you instead specify div="row", the rows will be treated as batches.

knitr::include_graphics('./omixer_48row.png')

Similarly, you can set div="col-block" or div="row-block" for batches that are 2 columns or rows wide, respectively. The image below shows how a 48 well plate would be subdivided with the div="col-block" option.

knitr::include_graphics('./omixer_48colblock.png')

Combining the above will allow you to create a large number of layouts commonly used in omics experiments.

Masking

If your experiment requires certain wells to be left empty, then these can be specified with the mask option. By default, no wells are masked, but you can specify masking, with 1 representing a masked well and 0 signifying that a sample should be randomized to this position.

Wells are numbered along each row sequentially. In the images above, row A includes wells 1 through 8, row B is wells 9 to 16, and so on until well 48 at F8.

Custom Layouts

If none of the automated layouts represent your setup you can input your own plate layout as a data frame. The only requirement is that the number of unmasked wells is equal to the number of samples in your experiment, and that you input the names of technical covariate columns to the techVars option.

For example, if we wanted to create a 96-well plate to send for 450K DNA methylation profiling, we might submit the following layout and techVars.

layout <- tibble(plate=rep(1, 96), well=1:96, 
    row=factor(rep(1:8, each=12), labels=toupper(letters[1:8])),
    column=rep(1:12, 8), chip=as.integer(ceiling(column/2)),
    chipPos=ifelse(column %% 2 == 0, as.numeric(row)+8, row))

techVars <- c("chip", "chipPos")

layout

Number of Iterations

The number of iterations performed by the omixerRand function can be altered using the iterNum option. By default, this is set to 1,000 but in complex scenarios with multiple nested batches and biological variables of interest, this may be increased to optimize randomization further. This will increase the run time however, and it may be preferable to run omixerRand with a lower iterNum initially, and inspect the resulting correlations in order to assess if this was sufficient or not.

Simple example

We create toy data, representing 48 samples to be sent off for RNA sequencing. All samples will be on a single 48-well flowcell, with each of the 8-sample rows being pipetting onto a lane, resulting in 6 lanes. This setup can be represented using provided Omixer layouts, as is described below.

First, we build the sample list that will be provided to Omixer, with information on the age, sex, and smoking status of individuals alongside sample isolation dates. We want to optimize distribution of these randomization variables across lanes on the flowcell to minimize batch effects.

sampleList <- tibble(sampleId=str_pad(1:48, 4, pad="0"),
    sex=as_factor(sample(c("m", "f"), 48, replace=TRUE)), 
    age=round(rnorm(48, mean=30, sd=8), 0), 
    smoke=as_factor(sample(c("yes", "ex", "never"), 48, 
        replace=TRUE)),
    date=sample(seq(as.Date('2008/01/01'), as.Date('2016/01/01'), 
        by="day"), 48))

sampleList

Randomization Variables

Using the randVars option, we inform Omixer which columns in our data represent randomization variables. You can specify any number of variables, but with increasing numbers it will become more difficult to optimize their distribution across batches.

randVars <- c("sex", "age", "smoke", "date")

Running Omixer

To perform multivariate randomization use the omixerRand function. For our example, we have one 96-well flowcell wells=96, plateNum=1 and want to optimize sample distribution across lanes div="row".

Following randomization, omixerRand will display a visualization of correlations between randomization and technical variables. If the returning correlations are higher than you would like, you can increase the iterNum or decrease the number of randomization variables.

omixerLayout <- omixerRand(sampleList, sampleId="sampleId", 
    block="block", iterNum=100, wells=48, div="row", 
    plateNum=1, randVars=randVars)

Following omixerRand, an optimal randomized sample list is returned. This can be used as is or processed by omixerSheet to create lab-friendly sample sheets, which will be shown below.

head(omixerLayout[1:11])

Regenerating Layouts

Since multivariate randomization can take some time with large datasets and many randomization variables, we provide the omixerSpecific function to reproduce previously generated layouts. After running omixerRand, the seed of the optimal layout is saved to the working directory.

After loading .Random.seed, you can run omixerSpecific to regenerate the optimal layout.

load("randomSeed.Rdata")

omixerLayout <- omixerSpecific(sampleList, sampleId="sampleId", 
    block="block", wells=48, div="row", 
    plateNum=1, randVars=randVars)

Sample Sheets

Once the multivariate randomization is complete, the resulting data frame can be input into omixerSheet to produce lab-friendly sample layouts. These will be saved in your working directory as a PDF document.

It is possible to colour code these sheets by a specific factor using the group option, and this is demonstrated in the extended example. If the text labels of the sample sheets are too large, then these can additionally be changed using the group.text.size and sample.text.size options.

omixerSheet(omixerLayout)

knitr::include_graphics('./omixer_sample_sheets_rna.PNG')

Extended example

To demonstrate the full functionality of Omixer, we present an extended example.

Here, our toy data represents 616 samples ready to be sent off for EPIC DNA methylation profiling. These samples will be randomized to 7 96-well plates where each of the 12 columns are transferred to a 8-sample EPIC chip. The last chip on each plate needs to be kept empty for control samples, and we will communicate this to Omixer using the mask option.

Our samples are taken from 4 different tissues of 77 individuals, and we are interested in how DNA methylation changes over 2 timepoints. Given our primary research question, we would like to keep the timepoints adjacent on the array but randomize their order. We can do this in Omixer with the block option, as demonstrated below.

Creating Toy Data

As well as a sample ID, we need to tell Omixer which variables specify paired sample blocks using a blocking variable, which we name block.

sampleList<- tibble(sampleId=str_pad(1:616, 4, pad="0"), 
    block=rep(1:308, each=2), 
    time=rep(0:1, 308), 
    tissue=as_factor(rep(c("blood", "fat", "muscle", "saliva"), 
        each=2, 77)), 
    sex=as_factor(rep(sample(c("male", "female"), 77, replace=TRUE), 
        each=8)), 
    age=round(rep(rnorm(77, mean=60, sd=10), each=8), 0), 
    bmi=round(rep(rnorm(77, mean=25, sd=2), each=8) , 1), 
    date=rep(sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), 
        by="day"), 77), each=8))

sampleList
save(sampleList, file="sampleList.Rdata")

Setting up Variables

We set up our randomization variables to optimize distribution of our biological factors across chips and plates. Randomization variables in our example are tissue, sex, age, body mass index (BMI), and isolation date.

randVars <- c("tissue", "sex", "age", "bmi", "date")

Since the last chip on each plate needs to be reserved, we specify a mask so that Omixer knows not to assign samples to these wells.

In the mask, a 0 indicates that a sample will be assigned to that well, and a 1 indicates that it should be left empty.

mask <- rep(c(rep(0, 88), rep(1, 8)), 7)

Running Omixer

Now we are ready to perform multivariate randomization with the omixerRand function. We specify 7 96-well plates wells=96, plateNum=7 subdivided into 8-sample EPIC chips div="col".

omixerLayout <- omixerRand(sampleList, sampleId="sampleId", 
    block="block", iterNum=100, wells=96, div="col", plateNum=7, 
    randVars=randVars, mask=mask)

Regenerating layouts

As in the simple example, any Omixer layouts can be regenerated using the saved random environment in the omixerSpecific function.

After loading .Random.seed, you can run omixerSpecific to regenerate the optimal layout.

load("randomSeed.Rdata")

omixerLayout <- omixerSpecific(sampleList, sampleId="sampleId", 
    block="block", wells=96, div="col", plateNum=7, 
    randVars=randVars, mask=mask)

Simple Randomization

Looking at the above correlations, you may wonder how Omixer compares to simple randomization. Briefly, we will investigate this.

Simple randomization can be replicated using omixerRand with a iterNum=1. Here, only one randomized layout will be created. If this is not optimal, a warning will print but the layout will still be returned.

simpleLayout <- omixerRand(sampleList, sampleId="sampleId", 
    block="block",iterNum=1, wells=96, div="col", plateNum=7, 
    randVars=randVars, mask=mask)

Here, we see strong evidence of a correlation between:

Age and plate (r = -0.117, p = 0.004)
BMI and chip (r = -0.094, p = 0.02)
Date and plate (r = 0.127, p = 0.002)

These patterns threaten the validity of our future inferences, as the effects of biological factors are entangled with technical variations.

In comparison, there is insufficient evidence to suggest correlation between any biological factor and technical covariate in the Omixer produced layout, and the largest correlation coefficient returned is -0.052, which is considerably lower than many seen in the simple randomized layout.

Sample Sheets

The bridge between dry and wet labs can be precarious. Technicians are often faced with long, monotonous lists of samples, which they need to pipette accurately to minimize sample mixups. This is especially prevalent in more complicated setups as in this extended example.

The omixerSheet function smooths this transition by creating lab-friendly PDFs of sample layouts.

You can colour code wells by another variable. In our example, we want to highlight wells of each tissue, since samples from one tissue are likely to be stored together group="tissue". The first page of the resulting PDF is shown below.

omixerSheet(omixerLayout, group="tissue")

knitr::include_graphics('./omixer_sample_sheets_dna.PNG')

Installing Omixer

Some users recieve an error when trying to install Omixer, stating that it is not available for their version of R. In order to solve this error you must update Bioconductor with the following code:

BiocManager::install()

References

Session Info

sessionInfo()

Any scripts or data that you put into this service are public.

Omixer documentation built on Feb. 4, 2021, 2:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Omixer
Randomize Samples for -omics Profiling

Omixer: multivariate and reproducible randomization
In Omixer: Randomize Samples for -omics Profiling

Introduction

Dependencies

Workflow

Creating Layouts

Automated Layouts

Subdivisions

Masking

Custom Layouts

Number of Iterations

Simple example

Randomization Variables

Running Omixer

Regenerating Layouts

Sample Sheets

Extended example

Creating Toy Data

Setting up Variables

Running Omixer

Regenerating layouts

Simple Randomization

Sample Sheets

Installing Omixer

References

Session Info

Try the Omixer package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Omixer Randomize Samples for -omics Profiling

Omixer: multivariate and reproducible randomization In Omixer: Randomize Samples for -omics Profiling

Introduction

Dependencies

Workflow

Creating Layouts

Automated Layouts

Subdivisions

Masking

Custom Layouts

Number of Iterations

Simple example

Randomization Variables

Running Omixer

Regenerating Layouts

Sample Sheets

Extended example

Creating Toy Data

Setting up Variables

Running Omixer

Regenerating layouts

Simple Randomization

Sample Sheets

Installing Omixer

References

Session Info

Try the Omixer package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Omixer
Randomize Samples for -omics Profiling

Omixer: multivariate and reproducible randomization
In Omixer: Randomize Samples for -omics Profiling