knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(kfigr)
library(RDML)

Introduction to the RDML package

The RDML package was created to work with the Real-time PCR Data Markup Language (RDML) -- a structured and universal data standard for exchanging quantitative PCR (qPCR) data [@lefever_rdml_2009; @rdml-ninja_2015]. RDML belongs to the family of eXtensible Markup Languages (XML), and contains fluorescence data as well as information about the qPCR experiment. A description of the RDML schema and the RDML format are available at http://rdml.org.

The RDML package is published in Oxford Bioinformatics: Stefan Rödiger, Michał Burdukiewicz, Andrej-Nikolai Spiess, Konstantin Blagodatskikh; Enabling reproducible real-time quantitative PCR research: the RDML package, Bioinformatics, https://doi.org/10.1093/bioinformatics/btx528 (see also citation()).

The XML technology is commonly used in the R environment and there are several tools to manipulate XML files [@xml_book]. When working with XML data, the workhorse package is XML [@Lang_XML]. Therefore, investments in development, implementation and maintenance is moderate.

We use the R6 [@Chang_R6], assertthat [@Wickham_assertthat], plyr [@Wickham_plyr], dplyr [@Wickham_Francois_dplyr], tidyr [@Wickham_tidyr] and rlist [@Ren_rlist] packages.

Philosphy of the RDML package

RDML imports various data formats (CSV, XMLX) apart from the RDML format. Provided that the raw data have a defined structure (as described in the vignette), the import should be done by a few clicks. The example below shows the import of amplification curve data, which were stored in a CSV file. The function rdmlEdit() was used in the RKWard IDE/GUI [@roediger_rkward_2012] (tested with version 0.6.9z+0.7.0+devel1 on Kubuntu 17.04; NOTE: problems were reported on systems where not the webkit component was used for the rendering of the rdmlEdit() GUI) for further processing.

HTML5 Icon

Once imported, this enables rdmlEDIT() and other functions from the RDML package to conduct complex data visualization and processing in the R statistical computing environment.

HTML5 Icon

Major functionalities of the RDML package

The public methods of the main R6 “RDML” class can be used to access and process the internally stored RDML data. These methods include:

Vendors and software packages supporting RDML

The RDML format is supported by Bio-Rad (CFX 96 and CFX 384), Life Technologies (StepOne, ViiA7, QuantStudio) and Roche (LightCycler~96) thermocycler systems. In addition, several software packages (e.g., Primer3Plus, RDML-Ninja, qBase+) support RDML [@untergasser_2007; @hellemans_2007; @ruijter_amplification_2009; @pabinger_2014; @rdml-ninja_2015].

Structure of the RDML package

The structure of the RDML package mimics the RDML format and provides several R6 classes that correspond to RDML v1.2 format types. All major manipulations with RDML data can be performed by a class called RDML through its public methods:

Opening and observing RDML file

In this section, we will use the built-in RDML example file lc96_bACTXY.rdml. This file was obtained during the measurement of human DNA concentration by a LightCycler 96 (Roche) and the XY-Detect kit (Syntol, Russia).

To open the lc96_bACTXY.rdml file, we have to create a new RDML object with its class initializer -- $new() and the file name as parameter filename.

filename <- system.file("extdata/lc96_bACTXY.rdml", package = "RDML")
lc96 <- RDML$new(filename = filename)

Next we can check the structure of our new object -- lc96 by printing it.

lc96

As a result we can see field names, and after :

The field names for all RDML package classes correspond to field names of RDML types as described at http://rdml.org/files.php?v=1.2.

HTML5 Icon

Image source: http://rdml.org/.

For the base class RDML, these are:

These fields can be divided by two parts:

Experiment field

Contains one or more experiments with fluorescence data. Fluorescence data are stored at the data level of an experiment. For example, fluorescence data for reaction tube 45 and target bACT can be accessed with the following code:

fdata <- 
  lc96$
    experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$
    run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$
    react$`45`$
    data$bACT$
    adp$fpoints #'adp' means amplification data points (qPCR)
head(fdata)

The structure of experiments can be visualized by plotting dendrograms.

lc96$AsDendrogram()

In this dendrogram r figr("dendrogram") we can see that our file consists of one experiment and one run. Four targets, each with two sample types (std -- standard, unkn -- unknown), are part of the experiment. There is only qPCR data -- adp in this experiment. Ten reactions (tubes) for standard type (std) and six reaction for the unknown (unkn) type. The total number of reactions can be more than the number of reactions on the plate because one tube can contain more than one target (e.g., multiplexing).

Additional information fields

All fields other than experiment. This additional information can be referenced in other parts of the RDML file. For example, to access sample added to react 39 and get its quantity, we can use code like this:

ref <- lc96$
          experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$
          run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$
          react$`39`$
          sample$id
sample <- lc96$sample[[ref]]
sample$quantity$value

Copying RDML objects

R6 objects are environments, that's why simple copying results in creating references to existing objects. Then modifying the copy leads to modification of the original object. To create a real copy of the object, we have to use method $clone(deep = TRUE) provided by the R6 class (or its shortcut $copy() inside RDML package).

id1 <- idType$new("id_1")
id2 <- id1
id3 <- id1$copy()
id2$id <- "id_2"
id3$id <- "id_3"
cat(sprintf("Original object\t\t: %s ('id_1' became 'id_2')\nSimple copy\t\t\t: %s\n'Real' copy (clone)\t: %s\n",
            id1$id, id2$id, id3$id))

From the example above, we can see that the modification of id2 led to a modification of the original object id1, however modification of cloned object id3 did not.

Modifying RDML objects

To modify the content of RDML objects, we can use fields as setters. These setters provide type safe modifications by input validation. In addition, setting lists of objects generates names of list elements.

## Create 'real' copy of object
experiment <- lc96$experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$copy()
## Try to set 'id' with wrong input type.
## Correct type 'idType' can be seen at error message.
tryCatch(experiment$id <- "exp1",
         error = function(e) print(e))

## Set 'id' with correct input type - 'idType'
experiment$id <- idType$new("exp1")

## Similar operations for 'run'
run <- experiment$run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$copy()
run$id <- idType$new("run1")

## Replace original elements with modified
experiment$run <- list(run)
lc96$experiment <- list(experiment)
lc96$AsDendrogram()

Then, we can observe in r figr("dendrogram_modified") our modification with the $AsDendrogram() method.

AsTable() method

To obtain information about all fluorescence data in an RDML file (type of added sample, used target, starting quantity etc.) as a data.frame, we can use the $AsTable() method. By default, it provides such information as:

To add custom columns to the output data.frame, we should pass it as a named method argument by generating expressions. Values for default columns can be used by custom name patterns and new columns referring to their names. The next example shows how to use the $AsTable() method with a custom name pattern and additional columns.

tab <- lc96$AsTable(
  # Custom name pattern 'position~sample~sample.type~target~dye'
  name.pattern = paste(
             react$position,
             react$sample$id,
             private$.sample[[react$sample$id]]$type$value,
             data$tar$id,
             target[[data$tar$id]]$dyeId$id,
             sep = "~"),
  # Custom column 'quantity' - starting quantity of added sample 
  quantity = {
    value  <- sample[[react$sample$id]]$quantity$value
    if (is.null(value) || is.na(value)) NULL
    else value
  }
)
## Remove row names for compact printing
rownames(tab) <- NULL
head(tab)

Here, the generated data.frame is also used as a query in $GetFData() and $SetFData() methods (see further sections).

Getting fluorescence data

We can extract the fluorescence data in two ways:

The advantage of $GetFData() is that it can combine fluorescence data from many plates into one data.frame. The major argument of this function is request, which defines fluorescence data to be extracted. This request is the output from the $AsTable() method and can be easily filtered by the dplyr filter() function. Furthermore, cycle limits, the output data.frame format, as well as data types (fdata.type = 'adp' for qPCR, fdata.type = 'mdp' for melting data) can by specified (see examples below).

library(dplyr)
library(ggplot2)

## Prepare request to get only 'std' type samples
filtered.tab <- filter(tab,
                       sample.type == "std")

fdata <- lc96$GetFData(filtered.tab,
                       # long table format for usage with ggplot2
                       long.table = TRUE)
ggplot(fdata, aes(cyc, fluor)) +
    geom_line(aes(group = fdata.name,
                  color = target))

Our example curves are not background subtracted, as visible in the plot. To accomplish this, we use the CPP() function from the chipPCR package [@roediger2015chippcr] as described in @roediger2015r.

library(chipPCR)
tab <- lc96$AsTable(
  # Custom name pattern 'position~sample~sample.type~target~run.id'
  name.pattern = paste(
             react$position,
             react$sample$id,
             private$.sample[[react$sample$id]]$type$value,
             data$tar$id,
             run$id$id, # run id added to names
             sep = "~"))
## Get all fluorescence data
fdata <- as.data.frame(lc96$GetFData(tab,
                                     # We don't need long table format for CPP()
                                     long.table = FALSE))

fdata.cpp <- cbind(cyc = fdata[, 1],
                   apply(fdata[, -1], 2,
                         function(x) CPP(fdata[, 1],
                                         x)$y))

Now we have properly preprocessed data, which we will add to our object and use in the next section.

Setting fluorescence data

To define fluorescence data as an RDML object, we can use the $SetFData() method. It takes three arguments:

In the following we will define preprocessed fluorescence data as the new run -- run1_cpp. The subelements of RDML such as experiment, run, react and data that do not initially exist in an RDML object are created by SetFData automatically (read more at Creating RDML from table section).

Note that the colnames in fdata and fdata.name in request have to be the same!

tab$run.id <- "run.cpp"
## Set fluorescence data from previous section
lc96$SetFData(fdata.cpp,
              tab)

## View setted data
fdata <- lc96$GetFData(tab,
                       long.table = TRUE)
ggplot(fdata, aes(cyc, fluor)) +
    geom_line(aes(group = fdata.name,
                  color = target))

Merging RDML objects

Merging RDML objects can be conducted by the MergeRDMLs() function. It takes a list of RDML objects and returns one single RDML object.

## Load another built in RDML file
stepone <- RDML$new(paste0(path.package("RDML"),
                           "/extdata/", "stepone_std.rdml"))
## Merge it with our 'lc96' object
merged <- MergeRDMLs(list(lc96, stepone))
## View structure of new object
merged$AsDendrogram()

Saving RDML object as RDML file

To save RDML objects in RDML file v1.2 format, we can use the $AsXML() method where the file.name argument is the name of new RDML file. Without file.name given, the function returns an XML tree.

lc96$AsXML("lc96.rdml")

One can use RDML-ninja or the RDML validator from the RMDL consortium to validate RDML files created by the RDML package.

Creating custom functions

R6 classes allow to add methods to existing classes by using the $set() method. Suppose that we want to add a method to preprocess all fluorescence data and calculate Cq values:

RDML$set("public", "CalcCq",
         function() {
           library(chipPCR)
           fdata <- as.data.frame(self$GetFData(
             self$AsTable()))
           fdata <- cbind(cyc = fdata[, 1],
                          apply(fdata[, -1],
                                2,
                                function(x)
                                  # Data preprocessing
                                  CPP(fdata[, 1],
                                      x)$y)
                          )

           apply(fdata[, -1], 2,
                 function(x) {
                   tryCatch(
                     # Calculate Cq
                     th.cyc(fdata[, 1], x,
                            auto = TRUE)@.Data[1],
                     error = function(e) NA)
                 })
         }
)

## Create new object with our advanced class
stepone <- RDML$new(paste0(path.package("RDML"),
                           "/extdata/", "stepone_std.rdml"))

We can then apply our new method:

stepone$CalcCq()

Creating RDML from tables

RDML objects can not only be generated from files but also from user data contained in data.frames. For this, we have to create an empty RDML object, create a data.frame containing the data and set it by the $SetFData() method. Minimal required information (samples, targets, dyes) will be created from the data description.

#### Create simulated data with AmpSim() from chipPCR package
## Cq for data to be generated
Cqs <- c(15, 17, 19, 21)
## PCR si,ulation will be 35 cycles
fdata <- data.frame(cyc = 1:35)
for(Cq in Cqs) {
  fdata <- cbind(fdata,
                 AmpSim(cyc = 1:35, Cq = Cq)[, 2])
}
## Set names for fluorescence curves
colnames(fdata)[2:5] <- c("c1", "c2", "c3", "c4")

## Create minimal description
descr <- data.frame(
  fdata.name = c("c1", "c2", "c3", "c4"),
  exp.id = c("exp1", "exp1", "exp1", "exp1"),
  run.id = c("run1", "run1", "run1", "run1"),
  react.id = c(1, 1, 2, 2),
  sample = c("s1", "s1", "s2", "s2"),
  sample.type = c("unkn", "unkn", "unkn", "unkn"),
  target = c("gene1", "gene2", "gene1", "gene2"),
  target.dyeId = c("FAM", "ROX", "FAM", "ROX"),
  stringsAsFactors = FALSE
)

## Create empty RDML object
sim <- RDML$new()
## Add fluorescence data
sim$SetFData(fdata, descr)

## Observe object
sim$AsDendrogram()
fdata <- sim$GetFData(sim$AsTable(),
                      long.table = TRUE)
ggplot(fdata, aes(cyc, fluor)) +
  geom_line(aes(group = fdata.name,
                color = target,
                linetype = sample))

Creating RDML from ABI 7500 v.2 software

RDML objects can be created from .eds files generated by the ABI 7500 v.2 or StepOne software (here, all analyses have to be conducted within the device software!).

filename <- system.file("extdata/from_abi7500/sce.eds", package = "RDML")
abi <- RDML$new(filename = filename)
abi$AsDendrogram()

Creating RDML from .csv files

RDML objects can be generated from .csv files. The files need to contain a first column named cyc for qPCR data or tmp for melting data. The remaining columns must contain the raw fluorescence signals.

filename <- system.file("extdata/from_tables/fdata.csv", package = "RDML")
csv <- RDML$new(filename = filename)
csv$AsDendrogram()

Creating RDML from .xls or .xslx files

RDML objects can also be generated from .xls or .xslx files. The files need to contain a description data.frame at tab sheet description and qPCR data at tab sheet adp, and/or qPCR data at tab sheet mdp.

filename <- system.file("extdata/from_tables/table.xlsx", package = "RDML")
xslx <- RDML$new(filename = filename)
xslx$AsDendrogram()

Also .xlsx import can be used with BioRad Custom Export - xlsx. Only main sheet have to be renamed as description and adp/mdp sheets added (from Quantification Data tab, then select RFU at dropdown list). See extdata/from_tables/BioRad.xlsx for example.

Functional style

To provide a functional programming style that is more convenient in R, the RDML class methods have function wrappers:

Using

Summary

The RDML package provides classes and methods to work with RDML data generated by real-time quantitative PCR devices or enables creating RDML files from user generated data. Because classes of the RDML package are built with R6, they can be modified by adding custom methods and ensure type safe usage through input validation.


Benchmark

The RDML format may also store data from high-throughput technologies such as digital PCR or 384-well qPCR. To make the RDML package the most appropriate tool for these applications, we optimized the saving functionality of the package. Hence, the RDML package is able to save even large numbers of qPCR experiments in reasonable time.

We benchmarked the time needed for saving RDML object with different numbers of experiments on two desktop machines:

  1. Windows 7 x64 (build 7601) Service Pack 1 on Intel Core i5 (2,40 GHz) with 4,00 GB RAM. R version 3.2.1 (2015-06-18).

  2. Kubuntu 15.04 (Kernel 3.19.0-27-generic) Intel(R) Pentium(R) CPU 997 @ 1.60GHz with 4,00 GB RAM. R version 3.1.2 (2014-10-31).

To warrant fairest comparison and taking into account factors like garbage collection, we repeated the procedure 100 times.

bench_df <- read.csv("bench_df.csv")

ggplot(bench_df, aes(x = nr, y = mean, colour = os)) +
  geom_point(size = 3, alpha = 0.5) +
  scale_x_continuous("Number of experiments") +
  scale_y_continuous("Median time [s]") +
  scale_color_discrete("Operating\nsystem") +
  ggtitle("Results of benchmark")

knitr::kable(bench_df[, c("nr", "median", "os")], format = "pandoc",
             col.names = c("Number of experiments", "Median time [s]", "Operating System"),
             caption = "Results of benchmark.")

The results above indicate that, thanks to the RDML package, an average laboratory workstation is able to save large RDML files in reasonable time (around 4 minutes for 1000 experiments).

Validation of RDML files created by the RDML package

The correct structure of an RDML file is an important prerequisite to perform reproducible analysis. Unfortunately, the created files can not be properly tested to date because the official RDML validator rdml.org is not online from time to time. However, there is a solution to validate the files by the alternate editor - RDML-Ninja [@rdml-ninja_2015].

RDMLedit web-server

The http://shtest.evrogen.net/rdmlEdit/ is a web tool based on the functionalities of the RDML package. Aside from basic functionalities like data import, modification and export, RDMLedit also produces visualisations of qPCR data. In addition, RDMLedit can also merge multiple RDML files.

Local deployment of RDMLedit

As a https://shiny.rstudio.com/, RDMLedit may be also deployed locally. It requires an installed R and RDML package. To initialize, the user has to open R and run the following command: rdmlEdit().

References



kablag/RDML documentation built on Oct. 20, 2022, 5:47 a.m.