knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(kfigr) library(RDML)
The RDML package was created to work with the Real-time PCR Data Markup Language (RDML) -- a structured and universal data standard for exchanging quantitative PCR (qPCR) data [@lefever_rdml_2009; @rdml-ninja_2015]. RDML belongs to the family of eXtensible Markup Languages (XML), and contains fluorescence data as well as information about the qPCR experiment. A description of the RDML schema and the RDML format are available at http://rdml.org.
The RDML package is published in Oxford Bioinformatics:
Stefan Rödiger, Michał Burdukiewicz, Andrej-Nikolai Spiess, Konstantin Blagodatskikh; Enabling reproducible real-time quantitative PCR research: the RDML package, Bioinformatics, https://doi.org/10.1093/bioinformatics/btx528 (see also citation()
).
The XML technology is commonly used in the R environment and there are several tools to manipulate XML files [@xml_book]. When working with XML data, the workhorse package is XML [@Lang_XML]. Therefore, investments in development, implementation and maintenance is moderate.
We use the R6 [@Chang_R6], assertthat [@Wickham_assertthat], plyr [@Wickham_plyr], dplyr [@Wickham_Francois_dplyr], tidyr [@Wickham_tidyr] and rlist [@Ren_rlist] packages.
RDML
imports various data formats (CSV, XMLX) apart from the RDML format. Provided that the raw data
have a defined structure (as described in the vignette), the import should be
done by a few clicks. The example below shows the import of amplification curve
data, which were stored in a CSV file. The function rdmlEdit()
was used in the
RKWard IDE/GUI [@roediger_rkward_2012] (tested with version 0.6.9z+0.7.0+devel1 on Kubuntu 17.04; NOTE: problems were reported on systems where not the webkit component was used for the rendering of the rdmlEdit()
GUI) for further processing.
Once imported, this enables rdmlEDIT()
and other functions from the RDML
package to conduct complex
data visualization and processing in the R statistical computing environment.
The public methods of the main R6 “RDML” class can be used to access and process the internally stored RDML data. These methods include:
The RDML format is supported by Bio-Rad (CFX 96 and CFX 384), Life Technologies (StepOne, ViiA7, QuantStudio) and Roche (LightCycler~96) thermocycler systems. In addition, several software packages (e.g., Primer3Plus, RDML-Ninja, qBase+) support RDML [@untergasser_2007; @hellemans_2007; @ruijter_amplification_2009; @pabinger_2014; @rdml-ninja_2015].
The structure of the RDML package mimics the RDML format and provides several R6 classes that correspond to RDML v1.2 format types. All major manipulations with RDML data can be performed by a class called RDML through its public methods:
$new()
-- creates new RDML object (empty or from specified RDML file) $AsTable()
-- represents data contained in RDML object (except fluorescent data) as data.frame.$GetFData()
-- gets fluorescent data.$SetFData()
-- sets fluorescent data to RDML object.$AsDendrogram()
-- represents structure of RDML object as dendrogram.In this section, we will use the built-in RDML example file lc96_bACTXY.rdml
. This file was obtained during the measurement of human DNA concentration by a LightCycler 96 (Roche) and the XY-Detect kit (Syntol, Russia).
To open the lc96_bACTXY.rdml
file, we have to create a new RDML object with its class initializer -- $new()
and the file name as parameter filename
.
filename <- system.file("extdata/lc96_bACTXY.rdml", package = "RDML") lc96 <- RDML$new(filename = filename)
Next we can check the structure of our new object -- lc96
by printing it.
lc96
As a result we can see field names, and after :
R6
objects contained at this field after ~
,:
,[]
.The field names for all RDML package classes correspond to field names of RDML types as described at http://rdml.org/files.php?v=1.2.
Image source: http://rdml.org/.
For the base class RDML, these are:
dateMade
dateUpdated
id
-- publisher and id to the RDML file.experimenter
-- contact details of the experimenter.documentation
-- these elements should be used if the same description applies to many samples, targets, or experiments.dye
-- information about a dye.sample
-- defined template solutions.target
-- defined PCR reactions.thermalCyclingConditions
-- cycling programs for PCR or to amplify cDNA.experiment
These fields can be divided by two parts:
Contains one or more experiments with fluorescence data. Fluorescence data are stored at the data level of an experiment. For example, fluorescence data for reaction tube 45 and target bACT can be accessed with the following code:
fdata <- lc96$ experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$ run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$ react$`45`$ data$bACT$ adp$fpoints #'adp' means amplification data points (qPCR) head(fdata)
The structure of experiments can be visualized by plotting dendrograms.
lc96$AsDendrogram()
In this dendrogram r figr("dendrogram")
we can see that our file consists of one experiment and one run. Four targets, each with two sample types (std -- standard, unkn -- unknown), are part of the experiment. There is only qPCR data -- adp in this experiment. Ten reactions (tubes) for standard type (std) and six reaction for the unknown (unkn) type. The total number of reactions can be more than the number of reactions on the plate because one tube can contain more than one target (e.g., multiplexing).
All fields other than experiment. This additional information can be referenced in other parts of the RDML file. For example, to access sample added to react 39 and get its quantity, we can use code like this:
ref <- lc96$ experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$ run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$ react$`39`$ sample$id sample <- lc96$sample[[ref]] sample$quantity$value
R6 objects are environments, that's why simple copying results in creating references to existing objects. Then modifying the copy leads to modification of the original object. To create a real copy of the object, we have to use method $clone(deep = TRUE)
provided by the R6 class (or its shortcut $copy()
inside RDML package).
id1 <- idType$new("id_1") id2 <- id1 id3 <- id1$copy() id2$id <- "id_2" id3$id <- "id_3" cat(sprintf("Original object\t\t: %s ('id_1' became 'id_2')\nSimple copy\t\t\t: %s\n'Real' copy (clone)\t: %s\n", id1$id, id2$id, id3$id))
From the example above, we can see that the modification of id2
led to a modification of the original object id1
, however modification of cloned object id3
did not.
To modify the content of RDML objects, we can use fields as setters. These setters provide type safe modifications by input validation. In addition, setting lists of objects generates names of list elements.
## Create 'real' copy of object experiment <- lc96$experiment$`ca1eb225-ecea-4793-9804-87bfbb45f81d`$copy() ## Try to set 'id' with wrong input type. ## Correct type 'idType' can be seen at error message. tryCatch(experiment$id <- "exp1", error = function(e) print(e)) ## Set 'id' with correct input type - 'idType' experiment$id <- idType$new("exp1") ## Similar operations for 'run' run <- experiment$run$`65aeb1ec-b377-4ef6-b03f-92898d47488b`$copy() run$id <- idType$new("run1") ## Replace original elements with modified experiment$run <- list(run) lc96$experiment <- list(experiment)
lc96$AsDendrogram()
Then, we can observe in r figr("dendrogram_modified")
our modification with the $AsDendrogram()
method.
AsTable()
methodTo obtain information about all fluorescence data in an RDML file (type of added sample, used target, starting quantity etc.) as a data.frame, we can use the $AsTable()
method. By default, it provides such information as:
fdata.name
-- aggregated name for current fluorescence data. Default pattern is position_sample_sample.type_target
(e.g., D03_Sample 39_std_bACT). This pattern can be modified by name.pattern
argument.exp.id
-- experiment id (e.g., exp1).run.id
-- run id (e.g., run1).react.id
-- react (tube) id (e.g., 39).position
-- react (tube) position (e.g., D03).sample
-- name of the added sample (e.g., Sample 39).target
-- detection target (e.g., bACT).target.dyeId
-- detection dye (e.g., FAM).sample,type
-- type of the added sample (e.g., std).adp
-- TRUE
if contains qPCR data.mdp
-- TRUE
if contains melting data.To add custom columns to the output data.frame, we should pass it as a named method argument by generating expressions. Values for default columns can be used by custom name patterns and new columns referring to their names. The next example shows how to use the $AsTable()
method with a custom name pattern and additional columns.
tab <- lc96$AsTable( # Custom name pattern 'position~sample~sample.type~target~dye' name.pattern = paste( react$position, react$sample$id, private$.sample[[react$sample$id]]$type$value, data$tar$id, target[[data$tar$id]]$dyeId$id, sep = "~"), # Custom column 'quantity' - starting quantity of added sample quantity = { value <- sample[[react$sample$id]]$quantity$value if (is.null(value) || is.na(value)) NULL else value } ) ## Remove row names for compact printing rownames(tab) <- NULL head(tab)
Here, the generated data.frame is also used as a query in $GetFData()
and $SetFData()
methods (see further sections).
We can extract the fluorescence data in two ways:
$GetFData()
The advantage of $GetFData()
is that it can combine fluorescence data from many plates into one data.frame. The major argument of this function is request
, which defines fluorescence data to be extracted. This request is the output from the $AsTable()
method and can be easily filtered by the dplyr filter()
function. Furthermore, cycle limits, the output data.frame
format, as well as data types (fdata.type = 'adp'
for qPCR, fdata.type = 'mdp'
for melting data) can by specified (see examples below).
library(dplyr) library(ggplot2) ## Prepare request to get only 'std' type samples filtered.tab <- filter(tab, sample.type == "std") fdata <- lc96$GetFData(filtered.tab, # long table format for usage with ggplot2 long.table = TRUE) ggplot(fdata, aes(cyc, fluor)) + geom_line(aes(group = fdata.name, color = target))
Our example curves are not background subtracted, as visible in the plot. To accomplish this, we use the CPP()
function from the chipPCR package [@roediger2015chippcr] as described in @roediger2015r.
library(chipPCR) tab <- lc96$AsTable( # Custom name pattern 'position~sample~sample.type~target~run.id' name.pattern = paste( react$position, react$sample$id, private$.sample[[react$sample$id]]$type$value, data$tar$id, run$id$id, # run id added to names sep = "~")) ## Get all fluorescence data fdata <- as.data.frame(lc96$GetFData(tab, # We don't need long table format for CPP() long.table = FALSE)) fdata.cpp <- cbind(cyc = fdata[, 1], apply(fdata[, -1], 2, function(x) CPP(fdata[, 1], x)$y))
Now we have properly preprocessed data, which we will add to our object and use in the next section.
To define fluorescence data as an RDML object, we can use the $SetFData()
method. It takes three arguments:
fdata
-- fluorescence data in long.table = FALSE
format request
-- output from the AsTable()
function, which is used to set data;fdata.type
-- fdata.type = 'adp'
for qPCR, fdata.type = 'mdp'
for melting data.In the following we will define preprocessed fluorescence data as the new run -- run1_cpp. The subelements of RDML such as experiment, run, react and data that do not initially exist in an RDML object are created by SetFData automatically (read more at Creating RDML from table section).
Note that the colnames in
fdata
andfdata.name
inrequest
have to be the same!
tab$run.id <- "run.cpp" ## Set fluorescence data from previous section lc96$SetFData(fdata.cpp, tab) ## View setted data fdata <- lc96$GetFData(tab, long.table = TRUE) ggplot(fdata, aes(cyc, fluor)) + geom_line(aes(group = fdata.name, color = target))
Merging RDML objects can be conducted by the MergeRDMLs()
function. It takes a list of RDML objects and returns one single RDML object.
## Load another built in RDML file stepone <- RDML$new(paste0(path.package("RDML"), "/extdata/", "stepone_std.rdml")) ## Merge it with our 'lc96' object merged <- MergeRDMLs(list(lc96, stepone)) ## View structure of new object merged$AsDendrogram()
To save RDML objects in RDML file v1.2 format, we can use the $AsXML()
method where the file.name
argument is the name of new RDML file. Without file.name
given, the function returns an XML tree.
lc96$AsXML("lc96.rdml")
One can use RDML-ninja or the RDML validator from the RMDL consortium to validate RDML files created by the RDML package.
R6 classes allow to add methods to existing classes by using the $set()
method. Suppose that we want to add a method to preprocess all fluorescence data and calculate Cq values:
RDML$set("public", "CalcCq", function() { library(chipPCR) fdata <- as.data.frame(self$GetFData( self$AsTable())) fdata <- cbind(cyc = fdata[, 1], apply(fdata[, -1], 2, function(x) # Data preprocessing CPP(fdata[, 1], x)$y) ) apply(fdata[, -1], 2, function(x) { tryCatch( # Calculate Cq th.cyc(fdata[, 1], x, auto = TRUE)@.Data[1], error = function(e) NA) }) } ) ## Create new object with our advanced class stepone <- RDML$new(paste0(path.package("RDML"), "/extdata/", "stepone_std.rdml"))
We can then apply our new method:
stepone$CalcCq()
RDML objects can not only be generated from files but also from user data contained in data.frames. For this, we have to create an empty RDML object, create a data.frame containing the data and set it by the $SetFData()
method. Minimal required information (samples, targets, dyes) will be created from the data description.
#### Create simulated data with AmpSim() from chipPCR package ## Cq for data to be generated Cqs <- c(15, 17, 19, 21) ## PCR si,ulation will be 35 cycles fdata <- data.frame(cyc = 1:35) for(Cq in Cqs) { fdata <- cbind(fdata, AmpSim(cyc = 1:35, Cq = Cq)[, 2]) } ## Set names for fluorescence curves colnames(fdata)[2:5] <- c("c1", "c2", "c3", "c4") ## Create minimal description descr <- data.frame( fdata.name = c("c1", "c2", "c3", "c4"), exp.id = c("exp1", "exp1", "exp1", "exp1"), run.id = c("run1", "run1", "run1", "run1"), react.id = c(1, 1, 2, 2), sample = c("s1", "s1", "s2", "s2"), sample.type = c("unkn", "unkn", "unkn", "unkn"), target = c("gene1", "gene2", "gene1", "gene2"), target.dyeId = c("FAM", "ROX", "FAM", "ROX"), stringsAsFactors = FALSE ) ## Create empty RDML object sim <- RDML$new() ## Add fluorescence data sim$SetFData(fdata, descr) ## Observe object sim$AsDendrogram() fdata <- sim$GetFData(sim$AsTable(), long.table = TRUE) ggplot(fdata, aes(cyc, fluor)) + geom_line(aes(group = fdata.name, color = target, linetype = sample))
RDML objects can be created from .eds files generated by the ABI 7500 v.2 or StepOne software (here, all analyses have to be conducted within the device software!).
filename <- system.file("extdata/from_abi7500/sce.eds", package = "RDML") abi <- RDML$new(filename = filename) abi$AsDendrogram()
RDML objects can be generated from .csv files. The files need to contain a first column named cyc for qPCR data or tmp for melting data. The remaining columns must contain the raw fluorescence signals.
filename <- system.file("extdata/from_tables/fdata.csv", package = "RDML") csv <- RDML$new(filename = filename) csv$AsDendrogram()
RDML objects can also be generated from .xls or .xslx files. The files need to contain a description data.frame at tab sheet description and qPCR data at tab sheet adp, and/or qPCR data at tab sheet mdp.
filename <- system.file("extdata/from_tables/table.xlsx", package = "RDML") xslx <- RDML$new(filename = filename) xslx$AsDendrogram()
Also .xlsx import can be used with BioRad Custom Export - xlsx. Only main sheet have to be renamed as description and adp/mdp sheets added (from Quantification Data tab, then select RFU at dropdown list). See extdata/from_tables/BioRad.xlsx for example.
To provide a functional programming style that is more convenient in R, the RDML class methods have function wrappers:
obj$AsTable(...)
-- AsTable(obj, ...)
obj$SetFData(...)
-- SetFData(obj, ...)
obj$GetFData(...)
-- GetFData(obj, ...)
obj$AsDendrogram(...)
-- AsDendrogram(obj, ...)
The RDML package provides classes and methods to work with RDML data generated by real-time quantitative PCR devices or enables creating RDML files from user generated data. Because classes of the RDML package are built with R6, they can be modified by adding custom methods and ensure type safe usage through input validation.
The RDML format may also store data from high-throughput technologies such as digital PCR or 384-well qPCR. To make the RDML package the most appropriate tool for these applications, we optimized the saving functionality of the package. Hence, the RDML package is able to save even large numbers of qPCR experiments in reasonable time.
We benchmarked the time needed for saving RDML object with different numbers of experiments on two desktop machines:
Windows 7 x64 (build 7601) Service Pack 1 on Intel Core i5 (2,40 GHz) with 4,00 GB RAM. R version 3.2.1 (2015-06-18).
Kubuntu 15.04 (Kernel 3.19.0-27-generic) Intel(R) Pentium(R) CPU 997 @ 1.60GHz with 4,00 GB RAM. R version 3.1.2 (2014-10-31).
To warrant fairest comparison and taking into account factors like garbage collection, we repeated the procedure 100 times.
bench_df <- read.csv("bench_df.csv") ggplot(bench_df, aes(x = nr, y = mean, colour = os)) + geom_point(size = 3, alpha = 0.5) + scale_x_continuous("Number of experiments") + scale_y_continuous("Median time [s]") + scale_color_discrete("Operating\nsystem") + ggtitle("Results of benchmark") knitr::kable(bench_df[, c("nr", "median", "os")], format = "pandoc", col.names = c("Number of experiments", "Median time [s]", "Operating System"), caption = "Results of benchmark.")
The results above indicate that, thanks to the RDML package, an average laboratory workstation is able to save large RDML files in reasonable time (around 4 minutes for 1000 experiments).
The correct structure of an RDML file is an important prerequisite to perform reproducible analysis. Unfortunately, the created files can not be properly tested to date because the official RDML validator rdml.org is not online from time to time. However, there is a solution to validate the files by the alternate editor - RDML-Ninja [@rdml-ninja_2015].
The http://shtest.evrogen.net/rdmlEdit/ is a web tool based on the functionalities of the RDML package. Aside from basic functionalities like data import, modification and export, RDMLedit also produces visualisations of qPCR data. In addition, RDMLedit can also merge multiple RDML files.
As a https://shiny.rstudio.com/, RDMLedit may be also deployed locally. It requires an installed R and RDML package. To initialize, the user has to open R and run the following command: rdmlEdit().
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.