knitr::opts_chunk$set(collapse = TRUE, comment = "#>", width = 68)
biocopy_colors <- c("#958BB2", "#C61E19", "#99CFE9", "#A2C510", "#FBB800") library(ggplot2) library(dplyr) if (!rlang::is_installed("htmltools")) install.packages("htmltools")
anabel aims to simplify the analysis of binding-curve fitting for scientists of different backgrounds, while minimizing user influence [@anabel1; @anabel2]. With the function run_anabel
, which supports three different modes, estimating kinetics constants is a straightforward task. The user can select the mode that is most appropriate for their experimental setup. Please note that this vignette assumes a basic understanding of real-time label-free biomolecular interactions. For more information and an introduction to the theoretical background, please refer to the online version.
Installing anabel within R
is similar to any other R package either using install.packages
or devtools::install
. Either way you choose, make sure to set dependencies = TRUE
. The core of anabel includes some packages commonly used for everyday data analysis, such as ggplot2, dplyr, purrr, reshape2
.
Once the installation is successful, you could start using anabel as follows:
library(anabel) packageVersion("anabel")
anabel accepts sensogram input in the form of an Excel or CSV file, or as a data frame. If providing a file, the full path must be specified, or anabel will attempt to read from the working directory.
The input data must be in numeric table format with a column dedicated to time. This column can have any name and use any R-approved symbols, as long as it contains the keyword 'time' (see exemplary datasets).
To specify the spots/sample names for the final results (tables + plots), you can provide an additional table with an 'ID' column containing the exact column names from the sensogram tables (except for the time-column), and a 'Name' column for mapping. Please note that 'ID' and 'Name' are reserved column names, and anabel will ignore the file if they are not present.
To run this tutorial, we will use simulated data that mimics typical 1:1 kinetics interactions. This data is available through anabel:
data("SCA_dataset") data("MCK_dataset") data("SCK_dataset")
To view the help page for anabel and the dataset, use the following command:
help(package = "anabel") ?SCA_dataset ?MCK_dataset ?SCK_dataset
All datasets that are used in this tutorial were generated using the Biacore™ Simul8 – SPR sensorgram simulation tool (Simul8)
[@simul8]
anabel currently offers two main functions, each with a help page that includes code examples:
?convert_toMolar() # show help page ?run_anabel() # show help page
The main function of anabel is run_anabel
, which analyzes sensograms of 1:1 biomolecular interactions using three different modes: Single-curve analysis (SCA), Multi-cycle kinetics (MCK), and Single-cycle kinetics (SCK). Additionally, the convert_toMolar
function converts the analyte concentration unit into molar, supporting units such as nanomolar (nm), millimolar (mm), micromolar (µM), and picomolar (pm). This function is case-insensitive and accepts variations such as nM, NM, nanomolar, and Nanomolar. In the following section (Analyte concentration), we explain how to use this function.
The first step is to convert the value of analyte-concentration into molar:
# one value in case of SCA method ac <- convert_toMolar(val = 50, unit = "nM") # vector in case of SCK and MCK methods ac_mck <- convert_toMolar(val = c(50, 16.7, 5.56, 1.85, 6.17e-1), unit = "nM") ac_sck <- convert_toMolar(val = c(6.17e-1, 1.85, 5.56, 16.7, 50), unit = "nM")
htmltools::img( src = knitr::image_uri("vignettes/strategies.png"), alt = "models", style = "padding:10px;width:100%; border:0" )
The parameters of SCA_dataset
are as follows:
myTable <- data.frame( Curve = paste("Sample.", LETTERS[1:3]), Ka = c(1e+6, 1e+6, 1e+6), Kd = c(1e-2, 5e-2, 1e-3), Conc = rep("50nM", 3), tass = rep(50, 3), tdiss = rep(200, 3) ) myTable$Expected_KD <- myTable$Kd / myTable$Ka kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F)
For example, Sample.A looks as follow:
ggplot(SCA_dataset, aes(x = Time)) + geom_point(aes(y = Sample.A), col = "#A2C510") + geom_vline(xintercept = 50, linetype = 2) + geom_vline(xintercept = 200, linetype = 2) + theme_minimal()
By default, anabel runs in SCA mode. Before using the function, make sure that the input data meet the following requirements:
The starting and ending time of the experiment are always single value, unlike the value of analyte concentration or association/dissociation time, these parameters are specific to the model.
Missing start or/and end of experiment time (tstart & tend resp.) are allowed, the values will be taken from the provided data.
check ?run_anabel to get full description of each parameter
sca_rslt <- run_anabel(SCA_dataset, tass = 50, tdiss = 200, conc = ac)
By default, the command creates a list of two data frames:
the kinetics table for this method contains the following information:
knitr::kable(sca_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%")
One way to visualize the results:
ggplot(sca_rslt$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2, scales = "free") + theme_light()
The MCK method is the most common method used for analyzing biomolecular interactions, and it involves injecting different analyte concentrations in independent cycles. We can use the simulated data provided in the MCK_dataset
to demonstrate how to analyze similar data with anabel. The data was created using the following parameters:
myTable <- data.frame( "tass" = 45, "tdiss" = 145, "Kass" = "1e+7nM", "Kdiss" = "1e-2", "KD" = 1e-2 / 1e+7, "Conc" = "50, 16.7, 5.56, 1.85, 6.17e-1" ) kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE)
temp <- MCK_dataset %>% tidyr::pivot_longer(!Time, names_to = "conc", values_to = "Responce") temp$analyte <- gsub("Conc\\.+", "", temp$conc) ggplot(temp, aes(x = Time)) + geom_point(aes(y = Responce, col = analyte)) + geom_vline(xintercept = 50, linetype = 2) + geom_vline(xintercept = 150, linetype = 2) + theme_light() + scale_color_manual(values = biocopy_colors) + theme(legend.position = "bottom")
The MCK
method assumes that each column in the input table represents one cycle with a different analyte concentration. Ideally, the values of the concentration should be different, but anabel will not throw an error if the same value is given to multiple cycles. However, it is the user's responsibility to check the validity of the input at this point.
As with SCA
, make sure that the following conditions hold:
MCK_dataset
requires 5 of each).mck_rslt <- run_anabel(MCK_dataset, tass = 45, tdiss = 145, conc = ac_mck, method = "MCK")
the order of the given analyte concentration should match the columns in the sensogram table. In case of
MCK_dataset
, the value of analyte concentration is decreasing therefore the input starts from 50 down to 6.1e-7.the estimated kinetics constants in the
kinetics
table are named accoriding to the parameter that was used in the fitting plus the cycle number (e.g. tass_1).the fitting was successful as no boundaries were violated (columns ParamsQualitySummary & FittingQ )
knitr::kable(mck_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%")
You can visualize the fitting results using the fit_data table.
ggplot(mck_rslt$fit_data, aes(x = Time, group = Name)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + theme_light()
Compared to the SCA method, the MCK method generates a slightly different output: it does not generate a report.
SCK
is a fitting mode used when in the experimental setup, the analyte concentration is titrated while increasing the concentration with only a short or even without a regeneration step in between. The simulated data SCK_dataset
was generated with the following parameters:
myTable <- data.frame( Param = c("Conc", "tass", "tdiss"), Step1 = c(6.17e-1, 35, 145), Step2 = c(1.85, 205, 315), Step3 = c(5.56, 375, 485), Step4 = c(16.7, 545, 655), Step5 = c(50, 715, 825) ) kableExtra::kable(myTable) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE)
Overall Kass = 1e+6nM
and Kdiss = 1e-2nM
, therefore, the expected is KD = 1e-08
.
To analyze a dataset with the SCK method, the input should include the following:
ggplot(SCK_dataset, aes(x = Time)) + geom_point(aes(y = Sample.A), size = 1, col = "#3373A1") + geom_vline(xintercept = c(35, 375, 715), linetype = 2, linewidth = 1, col = "#F08000") + # ta geom_vline(xintercept = c(145, 485, 825), linetype = 2, linewidth = 1, col = "#F08000") + # td geom_vline(xintercept = c(205, 545), linetype = 2, linewidth = 1, col = "#A2C510") + # ta geom_vline(xintercept = c(315, 655), linetype = 2, linewidth = 1, col = "#A2C510") + # td theme_minimal() + scale_x_continuous(breaks = seq(0, max(SCK_dataset$Time), 150))
To analyse this dataset with anabel use the following:
sck_rslt <- run_anabel(SCK_dataset, tass = c(35, 205, 375, 545, 715), tdiss = c(145, 315, 485, 655, 825), conc = ac_sck, method = "SCK" )
and the kinetics table:
knitr::kable(sck_rslt$kinetics) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F) %>% kableExtra::scroll_box(width = "100%")
and to visualize the outcome:
ggplot(sck_rslt$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light()
Baseline drift and surface decay are common experimental issues that can affect the estimation of kinetics from sensograms. anabel includes features to correct for these problems. In the following sections, we will demonstrate how to handle these cases using three datasets that suffer from either surface decay or drift. The datasets are named according to the type of problem and the method used for correction.
data("MCK_dataset_drift") # multi cycle kinetics experiment with baseline drift data("SCA_dataset_drift") # single curve analysis with baseline drift data("SCK_dataset_decay") # single cycle kinetics with exponentional decay
First, lets look at the data:
df <- tidyr::pivot_longer(SCA_dataset_drift, cols = contains("Sample")) ggplot(df, aes(Time, value)) + geom_point(aes(col = name), size = 0.5) + geom_vline(xintercept = c(50, 200), linetype = 2, linewidth = 0.5) + theme_light() + labs(y = "Response") + theme(legend.position = "bottom") + scale_x_continuous(breaks = seq(0, 1000, 100)) + scale_color_manual(values = biocopy_colors) + facet_wrap(~name, ncol = 2, scales = "free_y") + theme(legend.position = "none") + ggtitle("Five SCA sensograms with linear drift = -0.019")
to analyse this data, apply the drift correction when calling run_anabel
and visualize the results yourself if you didn't let anabel generate the output
sca_rslt_drift <- run_anabel(SCA_dataset_drift, tass = 50, tdiss = 200, conc = ac, drift = TRUE) ggplot(sca_rslt_drift$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light()
to analyse the MCK data with linear drift, apply the drift correction when calling run_anabel
:
mck_rslt_drift <- run_anabel(MCK_dataset_drift, tass = 45, tdiss = 145, conc = ac_mck, drift = TRUE, method = "MCK") ggplot(mck_rslt_drift$fit_data, aes(x = Time, group = Name)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + theme_light() + ggtitle("MCK five sensogram with linear drift = -0.01")
The simulated SCK_dataset
including an exponential decay component looks as follows:
df <- tidyr::pivot_longer(SCK_dataset_decay, cols = contains("Sample")) ggplot(df, aes(Time, value)) + geom_point(size = 0.2, col = "#3373A1") + geom_vline(xintercept = c(50, 390, 730), linetype = 2, linewidth = 1, col = "#F08000") + # ta geom_vline(xintercept = c(150, 490, 830), linetype = 2, linewidth = 1, col = "#F08000") + # td geom_vline(xintercept = c(220, 560), linetype = 2, linewidth = 1, col = "#A2C510") + # ta geom_vline(xintercept = c(320, 660), linetype = 2, linewidth = 1, col = "#A2C510") + # td theme(legend.position = "none") + facet_wrap(~name, ncol = 2) + theme_light() + ggtitle("Five SCK sensograms with exponential decay")
sck_rslt_decay <- run_anabel(SCK_dataset_decay, tass = c(35, 205, 375, 545, 715), tdiss = c(145, 315, 485, 655, 825), conc = ac_sck, method = "SCK", decay = TRUE ) ggplot(sck_rslt_decay$fit_data, aes(x = Time)) + geom_point(aes(y = Response), col = "#A2C510") + geom_path(aes(y = fit)) + facet_wrap(~Name, ncol = 2) + theme_light()
This mode is useful for users with a background in model optimization who want to understand the fitting model used by anabel. To enable debug mode, set debug_mode = TRUE
when running the run_anabel()
function.
When the debug_mode
parameter is set to TRUE, anabel will generate additional data frame that provide more information on the fitting process:
init_df
: contains the initial values of the fitting parameters for each binding curve. # call anabel in debug mode with sca data set my_data <- run_anabel(SCA_dataset, tass = 50, tdiss = 200, conc = ac, debug_mode = TRUE) init_df <- my_data$init_df # extract information of the first curve (Sample.A) response <- init_df$Response[1] %>% strsplit(",") %>% unlist() %>% as.numeric() # create a temp data frame containing both original value 'Value' and the estimated one 'Response' sampleA_df <- data.frame( Time = SCA_dataset$Time, Value = SCA_dataset$Sample.A, Response = response ) # Generate the plot associated with this curve ggplot(sampleA_df, aes(x = Time)) + geom_point(aes(y = Value), col = "#A2C510", size = 0.5) + geom_line(aes(y = Response)) + theme_light()
You can save anabel's fitting results by setting the option generate_output = "all"
and specifying the output directory outdir.
The following outcome will be saved in the specified directory:
?run_anabel
)If you only want specific output, you can set any of the associated options generate_Plots
, generate_Tables
, generate_Report
to TRUE.
If any of these options are TRUE
, you must set the generate_output
option to customized
.
generate_output
overwrits all other flags, its default value is "none", i.e. nothing is generated. Therefore, changing the other options without changing it will always be ignored.
The main goal of anabel is to support the scientific community for free and establish unified standards for kinetics analysis. It is continuously updated to ensure its usefulness for a variety of instruments. If you encounter an issue or bug, report it on the github page: Anabel github Repo
anabel the package and the online tool are supported by BioCopy GmBH.
````{=html} <!-- # Build info
# sessionInfo()
--> ````
{=html}
<!--
to save the html
devtools::build_rmd("vignettes/anabel.Rmd")
-->
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.