In JetiLab/blotIt: Software set for alignment of biological replicate data

blotIt - a framework for alignment of biological replicate data

The present package bases on blotIt2 by Daniel Kaschek. The aim of this toolbox is to scale biological replicate data to a common scale, making the quantitative data of different experiments comparable.

System preperation

blotIt requires the R packages utils, MASS, data.table, ggplot2, rootSolve, data.table and trust. Additionally, the package devtools is needed to install blotIt from github. If not already done, the required packages can be installed by executing

install.packages(c("utils", "MASS", "data.table", "ggplot2", "rootSolve", "trust", "devtools", "data.table"))

blotIt then is installed via devtools:

devtools::install_github("JetiLab/blotIt")

Usage

Data import

First, the package is imported

library(blotIt)

It is assumed that typically measurements are present in human-readable wide formatted .csv files. An example data set can be found at

exampleDataPath <- system.file(
                "extdata", "simDataWide.csv",
                package = "blotIt"
            )

The file has the structure

|time| condition| ID| pAKT| pEPOR| pJAK2|...| |--- | --- | --- | --- | ---|--- | ---| |0 |0Uml Epo |1.1 |116.838271399017| 295.836863524109| |... |5 |0Uml Epo |1.1 |138.808500374087| 245.229971713582| |... |...|...|...|...|...|...|... 0 |0Uml Epo |2 |94.4670174938645| |293.604761934545| ... 5 |0Uml Epo |2 | | |398.958892340432| ... |...|...|...|...|...|...|...

The first three columns contain description data: time points, measurement conditions and IDs (e.g. the IDs of the different experiments). All following columns contain the measurements of different targets, with the first row containing the names and the following the measurement values corresponding to the time, condition and ID stated in the first columns.

The information which columns contain descriptions has to be passed to readWide():

importedData <- readWide(
    file = exampleDataPath, # path to the example file
    description = seq(1, 3), # Indices of columns containing the information
    sep = ",",
    dec = "."
)

The result is then a long table of the form

| |time| condition| ID| name | value| |--- | --- | --- | --- | ---|--- | pAKT1| 0| 0Uml Epo| 1| pAKT| 116.83827 pAKT2| 5| 0Uml Epo| 1| pAKT| 138.80850 pAKT3| 10| 0Uml Epo| 1| pAKT| 99.09068 pAKT4| 20| 0Uml Epo| 1| pAKT| 106.68584 pAKT5| 30| 0Uml Epo| 1| pAKT| 115.02805 pAKT6| 60| 0Uml Epo| 1| pAKT| 111.91323 pAKT7| 240| 0Uml Epo| 1| pAKT| 132.56618 |...|...| ...| ...|...|...|

The wide table is melted into the long format, the columns name contain the column names of the columns not defined as description in readWide (i.e. the different targets) and value the respective values.

Data in this long format can then be passed to alignReplicates() for scaling

Scale data

Scaling biological replicates to one common scale has the advantage that the values although on arbitrary scale are comparable. There are multiple techniques to choose a common scale, the detailed procedure used here is explained in more dept in TODO: LINK TO CHAPTER

out <- alignReplicates(
  data = importedData,
  model = "yi / sj",
  errorModel = "value * sigmaR",
  biological = yi ~ name + time + condition,
  scaling = sj ~ name + ID,
  error = sigmaR ~ name + 1,
  normalize = TRUE,
  averageTechRep = FALSE,
  verbose = FALSE,
  normalizeInput = TRUE
)

We will go now through the parameters individually: - data A long table, usually the output of readWide() - model A formula like describing the model used for aligning. The present one yi / sj means that the measured values Y_i are the real values yi scaled by scaling factors sj. The model therefore is the real value divided by the corresponding scaling factor. - errorModel A description of which errors affect the data. Here, only a relative error is present, where the parameter sigmaR is scaled by the respective value - biological Description of which parameter (left hand side of the tilde) represented by which columns (right hand side of the tilde) contain "biological effects". In the present example, the model states that the real value is represented by yi -- which is the left hand side of the present biological entry. The present right hand side is "name", "time" and "condition". In short: we state that the entries "name", "time" and "condition" contain real, biological differences. - scaling Same as above, but here is defined which columns contain identificators of different scaling. Here it is "name" and "ID", meaning that measurements with differ in this effects, (but have the same biological effects) are scaled upon another. - error Describes how the error affects the values individually. The present formulation means, that the error parameter is not individually adjusted. - averageTechRep A logical parameter that indicates, if technical replicates should be averaged before the scaling. - verbose If set to TRUE additional information will be printed in the console. - normalizeInput If set to TRUE, the data will be scaled before the actual scaling. This means that the raw input will be scaled to a common order of magnitude before the scaling parameters will be calculated. This is only a computational aid, to eliminate a rare fail of convergence when the different values differ by many orders of magnitude.

The result of alignReplicates() is a list with the entries - aligned A data.frame with the columns containing the biological effects as well as the columns value containing the "estimated true values" and sigma containing the uncertainty of the fits. Both are on common - scaled The original data but with the values scaled to common scale and errors from the evaluation of the error model, also scaled to common scale (obeying Gaussian error propagation). - prediction The scales and sigma are from the evaluation of the respective models (on original scale). - original Just the original parameters - original_with_parameters As above but with additional columns for the estimated parameters. - biological Names of the columns defined to contain the biological effects. - scaling Names of the columns defined to contain the scaling effects.

Plot Data

blotIt3 provides one plotting function plot_align_me() which data set will be plotted can be specified per parameter

plot_align_me(
    out_list = scaled_data,
    plot_points = "aligned",
    plot_line = "aligned",
    spline = FALSE,
    scales = "free",
    align_zeros = TRUE,
    plot_caption = TRUE,
    ncol = NULL,
    my_colors = NULL,
    duplicate_zero_points = FALSE,
    my_order = NULL
)

The parameters again are: - out_list the result of align_me() - plot_points It can separately specified which data sets should be plotted as dots and as line. Here the data set for the dots is defined. It can be either of original, scaled, prediction or aligned. - plot_line Same above but for the line. - spline Logical parameter, if set to TRUE, the line plotted will be not straight lines connecting points but a smooth spline. - scales String passed as scales argument to facet_wrap. - align_zeros Logical parameter, if set to TRUE the zero ticks will be aligned throughout all the sub plots, although the axis can have different scales. - plot_caption Logical parameter, indicating if a caption describing which data is plotted should be added to the plot. - ncol Numerical passed as ncol argument to facet_wrap. - my_colors list of custom color values as taken by the values argument in the scale_color_manual method for ggplot objects, if not set the default ggplot color scheme is used. - duplicate_zero_points Logical, if set TRUE all zero time points are assumed to belong to the first condition. E.g. when the different conditions consist of treatments added at time zero. Default is FALSE. - my_order Optional list of target names in the custom order that will be used for faceting - ... Logical expression used for subsetting the data frames, e.g. name == "pAKT" & time < 60