Summary

Contact:

David Quigley
david.quigley@ucsf.edu
University of California, San Francisco
Helen Diller Family Comprehensive Cancer Center
Department of Epidemiology and Biostatistics

The HTDoseResponseCurve library makes it straightforward to analyze time course drug exposure experiments performed by high-throughput analysis of cells in multiwell plates.

Key Features

HTDoseResponseCurve does not perform image analysis.

HTDoseResponseCurve fits curves using the drc library, available on CRAN.


Quick Start: Dose Response Curve from data in a text file

The minimal information required for a Dose Response Curve is a set of observations with labels for sample_type, treatment, concentration, and value. A file called sample_data_1.txt, with the appropriate format is included with this package. The example data have control (vehicle) observations labeled "DMSO".

The following commands are explained in detail in the next section, "Loading data without plate information".

library(knitr, quietly = TRUE)
library(HTDoseResponseCurve)
library(drc, quietly = TRUE)

# 1) Load from file with columns sample_type, treatment, concentration, value
# 2) Fit dose response curve, estimate upper/lower asymptote, IC50, and slope
# 3) Plot dose response curve

fn = system.file("extdata", "sample_data_1.txt", package="HTDoseResponseCurve")
ds = read_dataset( fn, negative_control = "DMSO" )
fit_1 = fit_DRC( ds, 
                 sample_types=c("line1", "line2"), 
                 treatments="drug", 
                 fct=drc::LL.4() )
plot( fit_1, 
      xlab="nM drug", ylab="surviving fraction", 
      xlim=c(0, 1e5), ylim=c(0, 1.1) )

A brief Dose Response Curve primer

A dose response curve models the response of an organism to varying concentrations of a treatment. A typical example is the survival of cells in a dish after being dosed with increasing concentrations of a toxic drug. The response of living things to drugs is typically non-linear, and frequently takes a sigmoidal (S-shaped) shape. This curve can be modeled by fitting a non-linear model that has parameters for the upper and lower asymptotes, the slope of the middle of the curve, and the point at which the curve reaches the mid-way point between the upper and lower asymptotes. This midway point has various names with slightly different meanings, depending on the specific application, including the $IC_{50}$, $EC_{50}$, or $LD_{50}$.

The dose response curve for a drug effect E can be defined by the equation:

$E = \frac{ E_{max} ( \frac{D}{ IC_{50} })^m } {1 + (\frac{D}{IC_{50}})^m}$

where D is a particular drug dose, m is the slope of the curve, $E_{max}$ is the maximum possible drug effect, $IC_{50}$ is the value of D that reduces E to 50% of the distance between $E_{max}$ and $E_{min}$.

In the equation above, $E_{min}$ is assumed to be zero. If $E_{min}$ is not zero, we can model it explicity. For comparison, the four-parameter DRC equation used by the drc library is:

$E = c + \frac{d-c}{1+\exp(b(\log(x)-\log(e)))}$

where b is the slope (m), c is $E_{min}$, d is $E_{max}$, and e is the $IC_{50}$.

Fitting Dose Response Curves

HTDoseResponseCurve can be used with multiple timepoints

A primary motivation for development of this package was to make it easier to process high-throughput experiments run in multiwell plates, where the same plate would be observed repeatedly over time. Every observation in HTDoseResponseCurve therefore has a timepoint value associated with it; this value is conventionally called "hour" but could be days, minutes, or any other unit of time starting at 0. If you don't specify the hour, this time point is set automatically to zero. The examples in this section don't specify an hour.

Specifying how to normalize data

Data are first loaded and then optionally normalized against a negative control (also called a vehicle control, named after the vehicle substance in which a drug is dissolved to put it into solution). Many times it is desirable to reduce plate-specific or cell-line-specific effects by normalization of data against a control well that contains no perturbation other than the treatment vehicle. Normalization transforms each value into a percentage of the negative control. Normalized values can be greater than one. You can specify the negative control in four ways:

  1. No negative control wells exist.
    Use case example: data are already normalized.
  2. One or more wells are the shared negative control for all other treatments with that sample type on the plate.
    Use case example: All drugs on the plate were dissolved in DMSO, and DMSO-alone wells were run on the plate as a control.
  3. For each treatment, there is one or more well with a (presumably zero) concentration of that treatment that should be considered a negative control. This case is appropriate if each treatment has its own no treatment wells, labeled with the name of that treatment.
    Use case example: The plate carries Olaparib and Rucaparib treatments, and each has a DMSO control, labeled as "Olaparib" and "Rucaparib" respectively, with drug concentration labeled as 0 for control wells.
  4. There are two or more distinct sets of shared negative control wells, with some treatments mapped to each.
    Use case example: the plate carries three drugs with a shared DMSO control, and one with an ethanol control.

If you specify the existence of negative controls, HTDoseResponseCurve will expect to find them for all of the sample types and treatments on the plate.

Loading Data directly from R

If your data are already in R, you can create a dataset without loading files by the
create_dataset() function. The following example experiment tested the effect of four concentrations of a drug (200, 500, 1000, and 2000 nM) on the viability of two cell lines, normalized against measurements labeled "DMSO".

sample_types = rep( c(rep("line1",3), rep("line2",3)), 5)
treatments = c(rep("DMSO",6), rep("drug",24))
concentrations = c( rep(0,6),rep(200,6), rep(500,6),rep(1000,6),rep(5000,6))
values=c(100,99,100,90,91,92,99,97,99,89,87,88,
         86,89,88,56,59,58,66,65,67,25,23,24,42,43,46,4,5,9)

ds = create_dataset( sample_types = sample_types, 
                     treatments = treatments, 
                     concentrations = concentrations, 
                     values = values,
                     negative_control = "DMSO")

Loading Data from a text file

If your experiment data are in a text file, HTDoseResponseCurve can read directly from that file with the read_dataset() function. Each row should contain a single observation, e.g. a single well in a multi-well plate. The text file must have the following column headers in the first row:
sample_type
treatment
concentration
value

These headers may be in any order, and additional columns may be present. If your text file has the optional column hour, data from this column will be loaded into the dataset.

The following example experiment tested the effect of four concentrations of a drug (200, 500, 1000, and 2000 nM) on the viabilitity of two cell lines, normalized against measurements labeled "DMSO".

fn = system.file("extdata", "sample_data_1.txt", package="HTDoseResponseCurve")
ds = read_dataset( fn, 
                   negative_control = "DMSO", 
                   sep = "\t" )

The original measurements are stored unchanged in a column called value. Normalized measurements are stored in a column called value_normalized.

head(ds)

Summarizing a Dose Response Curve

The fit_DRC() function attempts to fit a dose response curve. The fit_DRC() function returns a HT_fit object. You can view essential information about the fit that was obtained using the R core function summary():

fit_1 = fit_DRC( ds, 
                 sample_types=c("line1", "line2"), 
                 treatments="drug", 
                 fct=drc::LL.4() )
summary( fit_1 )

The summary() function returns the coefficients estimated by the fit. If there is more than one unique condition in the fit, summary() reports the result of an ANOVA test comparing a model lacking parameters for treatment or sample type to a model with treatment/sample type as a covariate. The F-statistic and p-value values reported in the summary of fit_1 are derived from the ANOVA. Extremely low P values may be rounded to zero by the summary() function.

The summary also reports an estimated Area Under the Curve (AUC), an estimated $EC_{50}$ (Effective Concentration, an estimate of the concentration of treatment that reduces the measured value to halfway between the highest and lowest amounts), an estimated $SF_{50}$ (Surviving Fraction, an estimate of the concentration of treatment that reduces the measured value to 50%), and the fitted values at the lowest and highest concentrations of drug.

If the data are not at all sigmoidal, the fit might not converge. This will be reported in the summary.

Plotting a Dose Response Curve

To plot a dose response curve, pass the HT_fit object generated by the fit_DRC() function to the plot() function.

fit_1 = fit_DRC( ds, fct=drc::LL.4() )
plot( fit_1 )

Note that in this summary for line 1 in the previous section, the SF50 value for line 1 is 1919.97 while the EC50 value is 855.24. This curve, where line 1 is plotted in black, should make clear the distinction between the SF50 and EC50: the EC50 is the concentration required to lower the fitted value to the 50th percentile between the upper and lower asymptote, while the SF50 is the concentration required to lower the fitted value to 0.5, or 50%. Since the lowest value on the black curve is only slightly lower than 50%, its SF50 is much larger than its EC50.

The plot() function understands the parameters for R's built-in plot() function, so it is straightforward to alter its appearance:

plot(fit_1,
     xlim=c(0, 1e5),
     main="Drug dose response curve", 
     xlab="concentration nM",
     ylab="surviving fraction", 
     lwd=3, 
     col=c("black", "cornflowerblue") )

legend( 2, 0.4, c("Cell Line 1", "Cell Line 2"), 
        fill=c("black", "cornflowerblue"), bty="n")

Specifying which model coefficients to fit

The fit_DRC() function fits and plots curves using non-linear curve fitting methods exposed by functions in the drc package. It is important to understand that this model has several coefficients, and it is possible to set some of these coefficients to fixed values rather than estimating them from the data. Estimating values is neither always correct nor always wrong; it requires judgement to decide how to fit the curve. Christian Ritz, Florent Baty, Jens C. Streibig, and Daniel Gerhard have published a very useful guide that provides a gentle introduction to this topic: Dose-Response Analysis Using R Ritz PLoS ONE 2015. Reading that well-written tutorial is strongly recommended for anyone who will use either the drc or the HTDoseResponseCurve package.

As a reminder, the four parameter model is:

$E = c + \frac{d-c}{1+\exp(b(\log(x)-\log(e)))}$

where b is the slope (m), c is $E_{min}$, d is $E_{max}$, and e is the $IC_{50}$.

To estimate the dose response slope, upper asymptote, lower asymptote, and $EC_{50}$, pass drc::LL.4() to the fct parameter of fit_DRC(). To fix the lower asymptote to zero, pass drc::LL.3() into fit_DRC(). To fix the lower and upper asymptotes to zero and 1 respectively, pass drc::LL.2().

If you wish to display the $EC_{50}$ estimates on the curve with dotted vertical lines, pass show_EC50=TRUE. The $EC_{50}$ concentrations may be different from the 50% surviving Fraction concentration ($SF_{50}$) if the lower asymptote is not zero.

# estimate EC50, slope, upper asymptote, lower asymptote
fit_1 = fit_DRC(ds, 
                sample_types = c("line1", "line2"), 
                treatments = "drug",
                fct=LL.4())

# estimate EC50, slope, upper asymptote. Lower asymptote set to 0
fit_1_lower0 = fit_DRC(ds, 
                sample_types = c("line1", "line2"), 
                treatments = "drug",
                fct=LL.3())

# estimate EC50, slope. Upper, lower asymptotes set to 1, 0.
fit_1_both = fit_DRC(ds, 
                sample_types = c("line1", "line2"), 
                treatments = "drug",
                fct=LL.2())

The fit_DRC() function will sometimes report warnings or errors generated by the drc::drm() function during curve fitting. Errors indicate that the model was not able to converge to a fit. This can occur if the data do not have a sufficiently sigmoidal shape.

We can plot curves fitted with the drc functions LL.4(), LL.3(), and LL.2() to see the results of different choices for which parameters to estimate:

layout(matrix(1:3,1,3))
par(mar=c(3,3,3,1))
plot( fit_1, main="LL.4()", xlim=c(0, 1e6), show_EC50=TRUE )
plot( fit_1_lower0, main="LL.3(), lower=0", xlim=c(0, 1e6), show_EC50=TRUE )
plot( fit_1_both, main="LL.2(), upper=1, lower=0", xlim=c(0, 1e6), show_EC50=TRUE )

Note that in the LL.3() and LL.2() calls, the curves asymptotically approach zero because we specified that the lower asympotote should be zero. Theose two plots have higher $EC_{50}$ estimates than LL.4() on the left, because the $EC_{50}$ is measured relative to the upper and lower asymptotes for the curve, and the curves have different lower asymptotes.

We can emphasize the difference between the EC50 and SF50 for this curve fitting by plotting both values for the same curve:

layout(matrix(1:3,1,3))
par(mar=c(3,3,3,1))
plot( fit_1, main="LL.4(), SF50", xlim=c(0, 1e6), show_SF50=TRUE )
abline(0.5, 0)
plot( fit_1, main="LL.4(), EC50", xlim=c(0, 1e6), show_EC50=TRUE )
abline(0.5, 0)

Note that drug one (black line) reaches its EC50 at a lower concentration than its SF50.

You can fit and plot more than one treatment or sample type if there are multiple values in your data set by passing them to the treatments or sample_type parameters as a vector.


Loading data from multi-well plates

A plate map describes the experiment

HTDoseResponseCurve can also read data from 6-, 12-, 24-, 96-, and 384-well plates described using a Microsoft Excel spreadsheet. The contents of each well and the treatments applied to those wells are described in a plate map. The minimal elements required to describe the cells in a well are sample type, treatment, and concentration. The sample type indicates what kind of cell is in the well (e.g., MCF7, PC3, WT). The treatment indicates the perturbation on those cells, often a drug (e.g. Olaparib, DMSO, si-TP53, Vehicle). The concentration indicates the concentration of the treatment applied to the well.

Although you may build a plate map manually by combining individual matrixes into a list object, HTDoseResponseCurve contains functions to load an externally defined plate map either from an XML file or a Microsoft Excel file. HTDoseResponseCurve can natively consume the XML-formatted files platemap file exported by the Incucyte Cell Analysis System. This instrument resides in an incubator and captures microscopic images of either white light or fluorescent light.

Create plate data from within R

It is usually easier to write out a plate map in a text file or Excel spreadsheet, but you can also create and manipulate plate maps directly in R. The plate map is stored as a list of data frames. This example will demonstrate a 96 well plate.

First, we'll create an empty plate map with create_empty_plate_map():

plate_map = create_empty_plate_map( number_of_wells = 96 )

The object returned by this call is a list of five 96-element matrixes named treatment, concentration, sample_type, density, and passage. The density and passage matrixes are optional; the treatment, concentration, and sample_type values must be populated with relevant information.

Next, we'll populate the plate map with information about the treatment, drug concentration, and cell lines in each well where there was a measurement. These commands simulate two cell lines tested with a single drug at four concentrations (200, 500, 1000, and 2000 nM), including a negative vehicle control of DMSO. Data will be placed into wells C through G, columns 2 through 7.

plate_map$treatment[3,2:7] =   "DMSO"
plate_map$treatment[4:7,2:7] = "drug"
plate_map$concentration[3,2:7] = 0
plate_map$concentration[4,2:7] = 200
plate_map$concentration[5,2:7] = 500
plate_map$concentration[6,2:7] = 1000
plate_map$concentration[7,2:7] = 2000
plate_map$sample_type[3:7, 2:4] = "line1"
plate_map$sample_type[3:7, 5:7] = "line2"

Create observed plate data from within R

Now that we have specified a plate map, we'll specify the values we observed in those wells. We'll use the create_empty_plate() function to create an empty data plate object and manually populate that object with simulated measurements collected in triplicate:

plate_1 = create_empty_plate( number_of_wells=96, 
                            plate_id="test" )

plate_1[3,2:7] = c(100,99,100, 90, 91, 92)
plate_1[4,2:7] = c(99, 97, 99, 89, 87, 88)
plate_1[5,2:7] = c(86, 89, 88, 56, 59, 58)
plate_1[6,2:7] = c(66, 65, 67, 25, 23, 24)
plate_1[7,2:7] = c(42, 43, 46,  4,  5,  9)

The object returned by this plate is a data frame. We can confirm the plate looks the way we expect it to by plotting the raw data on a plate with the plot_values_by_plate() function. The darker red the well, the higher the value in that well.

plot_values_by_plate(plate_1)

Combine the plate map and the raw data

It is common to re-use the same plate map information for data from many timepoints. To analyze the data we will combine the measurements (in the plate variable) and metadata (in the plate_map variable) into a single dataset.

When combining the measurements and metadata, you must specify which wells (if any) correspond to negative controls, also known as vehicle-treated wells. Negative controls are described above in the section Specifying how to normalize data; please review that section if you skipped it.

In this example, there are r sum(plate_map$treatment=="DMSO") negative control wells with the treatment label "DMSO".

ds = combine_data_and_map( plate_1, plate_map, negative_control = "DMSO" )

The data frame returned by combine_data_and_map() is a dataset combining the well metadata and the measured values.

We can extract sample types or treatments from the dataset object using get_sample_types() and get_treatments()

get_sample_types(ds)
get_treatments(ds)

The dataset object returned by combine_data_and_map is the same regardless of whether it was created by the combine_data_and_map() function or by a call to create_dataset() as described in the previous section.

Quick Start: Loading data generated by the Incucyte system

Load raw data and plate map from Incucyte

If you are using the Incucyte system to generate data, you can export plate readings to Microsoft Excel and then read them in using the function read_plates_from_Incucyte_export().

If you have exported a plate map from the Incucyte software, it can be read into HTDoseResponseCurve using the read_platemap_from_Incucyte_XML() function. You can also load a plate map from an Excel file using the read_platemap_from_excel() function, or you can construct a plate map manually in R as was shown above.

As described above, the steps are

  1. Read in plate map
  2. Read in raw data
  3. Combine plate map and raw data, specifying a normalization strategy.
pkg = "HTDoseResponseCurve"

fn_map_XML = system.file("extdata", "sample_data_384_platemap.txt",package=pkg)
fn_data_Excel = system.file("extdata", "sample_data_384.xlsx", package = pkg)

plate_map = read_platemap_from_Incucyte_XML( path_to_file = fn_map_XML )
plate_data = read_plates_from_Incucyte_export( path_to_file = fn_data_Excel, 
                                               plate_id = "p1", 
                                               number_of_wells=384)
plate_data$hours = round(plate_data$hours)
ds = combine_data_and_map( raw_plate = plate_data,
                           plate_map = plate_map, 
                           negative_control = "DMSO" )

Example: sample data from an Incycyte experiment

This experiment tested the effect of r length(get_treatments(ds[!ds$is_negative_control,])) different drugs on two cell lines, imaged every 12 hours over five days.

In this example, there are r sum(!is.na(plate_map$treatment) & plate_map$treatment=="DMSO") negative control (vehicle) wells on each plate with the treatment label "DMSO".

The expected concentration unit for these plots will be nanomoles of drug; since the experimental data for this example were recorded in micromoles, we will multiply the concentration by 1000 to get nanomoles.

To illustrate how the plate readings changed over time, the plot_values_by_plate() function is useful:

layout(matrix(1:4,2,2,byrow=TRUE))
par(mar=c(3,3,2,1))
plot_values_by_plate(plate = plate_data, hour = 48, main="48 hours")
plot_values_by_plate(plate = plate_data, hour = 96, main="96 hours")
plot_values_by_plate(plate = plate_data, hour = 192, main="192 hours")
plot_values_by_plate(plate = plate_data, hour = 228, main="228 hours")

Analysis at more than one time point

The data set we loaded in the previous section contains observations from more than one time point. We therefore have to specify the hour to plot when fitting and plotting a dose response curve:

fit_inc = fit_DRC( ds, 
                   sample_types=c("line_1","line_2"), 
                   treatments = "drug13", 
                   hour=168, 
                   fct=drc::LL.4() )
plot( fit_inc, 
      xlim=c(0, 1e5),
      lwd=3, 
      main=paste("Drug 13: 168 hours"),
      ylab="surviving fraction", 
      xlab="nM")

If you forget to specify a particular hour in a dataset with multiple time points, the fit_DRC() function will generate a warning:

fit_inc = fit_DRC( ds, 
                   sample_types=c("line_1","line_2"), 
                   treatments = "drug13", 
                   fct=drc::LL.4() )
plot( fit_inc, 
      xlim=c(0, 1e5),
      lwd=3, 
      main=paste("Drug 13: all hours (don't do this)"),
      ylab="surviving fraction", 
      xlab="nM")

Summarize growth across many time points

To show the growth of untreated cells over time, we can plot the raw value at these timepoints.

par(mar=c(5,7,3,2))
x=plot_timecourse( 
    ds, 
    sample_types=c("line_1","line_2"),
    treatments="DMSO",
    concentrations=0,
    plot_raw = TRUE,
    main=paste("DMSO (vehicle)"), 
    ylab="Confluence",
    ylim=c(0,100) )

legend(0, 100, c("L1 DMSO", "L2 DMSO"), pch=c(19,22), bty="n" )

The dose response curve for Line 1 is drawn with closed points, while line 2 is drawn with open squares. Line 1 (L1) grows faster than L2; by 180 hours, the DMSO-treated wells for that line have become confluent.

To examine un-normalized values for wells treated with a particular drug across the two cell lines, we can look at the raw confluence score at these timepoints.

x=plot_timecourse( 
    ds, 
    sample_types=c("line_1","line_2"),
    treatments=c("DMSO","drug13"), 
    concentrations=c(0,5000), 
    plot_raw=TRUE,
    main=paste("Drug 13"), 
    ylab="Confluence",
    ylim=c(0,100) )

legend(0, 100, c("L1 DMSO","L2 DMSO",  "L1 5000 nM", "L2 5000 nM"),
       pch=c(19,19,22,22), 
       col=c("black", "red","black",  "red"), bty="n" )

Here we can directly examine the un-normalized vehicle treated samples (black) and 5000 nM samples (red). Drug 13 only modestly affects cell line L1 but has a dramatic impact on cell line L2.

Note that the raw data for this plot and the color and "pch" value indicating what shape was plotted for each condition is returned in a data frame by plot_timecourse(). To isolate these values one can inspect the data frame manually or use the unique() function:

legend_vals = unique(x[,c("sample_type", "concentration", "color", "pch")] )
legend_vals

Dose response curves at many timepoints at once

If we are curious about particular timepoints, we can fit and plot dose response curves individually. Here we plot dose response curves for drug 13 at four different time points:

layout(matrix(1:4,2,2,byrow=TRUE))
par(mar=c(4,4,3,1))
hours_to_plot=c(48, 96, 120, 168)
for(h in 1:4){
    plot( fit_DRC( ds, 
                   sample_types=c("line_1","line_2"), 
                   treatments = "drug13", 
                   hour=hours_to_plot[h], fct=drc::LL.3() ), 
          xlim=c(0, 1e6),
          lwd=3, 
          main=paste("Drug 13:", hours_to_plot[h], "hours"), 
          ylab="surviving fraction", 
          xlab="nM")
}

It would be convenient to calculate fits for all of the treatments at all timepoints. The fit_statistics() and plot_fit_statistic() functions are useful for this purpose. The fit_statistics() function attempts to fit dose response curves at all time points for all treatments and then calculates summary values of the AUC and EC50. These values are returned as a data frame with one row for each unique condition. The plot_fit_statistic() function plots either EC50 or AUC across all of the timepoints in the fit statistics.

First, we'll use fit_statistics() calculate fits for four drugs. We could calculate fits for all drugs just as easily but it would take more processing time.

# select four drugs to test, to save time in curve fitting
ds_test = ds[ds$treatment %in% c("DMSO", "drug01", "drug13", "drug20", "drug21"),]

# calculate fit statistics at all time points
fits = fit_statistics(ds_test, fct = drc::LL.4() )

We can summarize the results for a given drug in a table:

library(knitr)
LINE1 = fits$treatment=="drug13" & fits$sample_type=="line_1"
LINE2 = fits$treatment=="drug13" & fits$sample_type=="line_2"
kable( 
    data.frame(
        hour=fits$hour[LINE1],
        AUC_line1 = round(fits$AUC[LINE1], 2),
        AUC_line2 = round(fits$AUC[LINE2], 2),
        EC50_line1 = fits$coef_EC50[LINE1],
        EC50_line2 = fits$coef_EC50[LINE2],
        obs_min_line1 = round(fits$obs_min[LINE1], 2),
        obs_min_line2 = round(fits$obs_min[LINE1], 2),
        F_stat = round( fits$ANOVA_F_test[LINE1], 2),
        P_value = round( fits$ANOVA_P_value[LINE1], 4) )
)

We can summarize these results for a single treatment (drug 13) across all time points at once using the plot_treatment_summary() function. This function will plot dose response curves from up to four time points.

#pdf("/notebook/HTDoseResponseCurve_paper/reproduce/treatment_summary.pdf",
#    height=8, width=8)
plot_treatment_summary(ds_test, 
                       fits,
                       treatment = "drug13", 
                       times_DRC = c(48, 96, 120, 168))
#dev.off()

If we want a more detailed plot for AUC across all time points, we can use the function plot_fit_statistic(). Observations where the ANOVA_P_value exceeds an alpha-level cut-off set by the user are drawn as open circles and not connected by lines. Here the alpha-level cut-off is 0.0025, chosen to correct an alpha = 0.05 for 20 tests by simple Bonferroni adjustment. The choice of whether to set a P value threshold, and how that value might be selected, is entirely up to you.

ps = plot_fit_statistic( fits[fits$treatment=="drug13",], 
                    statistic = "AUC", 
                    alpha = 0.0025,
                    ylim=c(0, 1.5),
                    main="Area Under the Dose Response Curve, drug 13")
legend(x=0, y=0.3, 
       legend=unique(ps$sample_type), 
       col=unique(ps$color), pch=19, bty="n" )
legend(x=36, y=0.3, 
       legend=c("P>alpha", "P<=alpha"), 
       pch=c(13,19),  bty="n", col="darkgrey" )

This plot shows the AUC for line 1 exceeds that of line 2 at later time points, and that the difference between the dose response curves is significant at the P < 0.0025 level at and after 132 hours. In these plots, the open vs. closed circles reflects the $P_{ANOVA}$ value for the test for a difference in dose response curves between the two lines. Here a filled-in circle means $P_{ANOVA}$ <= 0.0025.

It is somewhat surprising that drug 13 also apparently shows a $P_{ANOVA}$ <= 0.0025 at 12 hours, so we can take a look at the dose response curve for that time point for more information:

plot( fit_DRC( ds, 
               sample_types=c("line_1","line_2"), 
               treatments = "drug13", 
               hour=12, fct=drc::LL.3() ), 
     xlim=c(0, 1e5),
     lwd=3, 
     main=paste("Drug 13: 12 hours"), 
     ylab="surviving fraction", 
     xlab="nM",
     ylim=c(0, 1.5))
legend(x=2, y=0.4, 
       legend=c("line 1", "line 2"), 
       fill=c("black", "red"), bty="n" )

The fit converged here, and the values were significantly different from each other in the ANOVA test, but it's clear that there is not a sensible dose response effect and neither drug acheived a $SF_{50}$. We can screen out these values by passing the obs_min parameter value:

ps = plot_fit_statistic( 
            fits[fits$treatment=="drug13",], 
            statistic = "AUC", 
            alpha = 0.0025,
            obs_min = 0.5,
            ylim=c(0,1.5),
            main="Area Under the Dose Response Curve, drug 13",
            sub="Requiring SF50 for at least one sample type")
legend(x=0, y=0.3, 
       legend=unique(ps$sample_type), fill=unique(ps$color), bty="n" )
legend(x=36, y=0.3, 
       legend=c("P>alpha", "P<=alpha, min>obs_min", "P<=alpha, min<=min_obs"), 
       pch=c(13,18,19),  bty="n", col="darkgrey" )

Relative AUC grids

When there are many drugs in a screen, it may be convenient to generate a summary value such as $\large{ log2( \frac{ AUC_{line1} }{ AUC_{line2} } ) }$ and plot a heat map view across time points. For this example, I will color in the heat map if P < 0.0008 for the ANOVA testing for a cell line effect.

I'll plot the values using plot_color_grid(), a general function for plotting any matrix of numbers:

# convert long data frame to individual matrixes
fit_mat = fit_statistics_matrixes( fits )

# mark as NA values that exceed a P value threshold
rel_AUC = convert_to_foldchange( fit_mat$line_1$AUC, fit_mat$line_2$AUC )
ALPHA_LEVEL = 0.008
invalid_alpha = is.na( fit_mat$line_1$ANOVA_P_value ) | 
                fit_mat$line_1$ANOVA_P_value > ALPHA_LEVEL

rel_AUC[ invalid_alpha  ] = NA

M=plot_color_grid( rel_AUC, color_bounds = c(-3,3) )

We customize the plot using parameters of plot_color_grid():

M=plot_color_grid( rel_AUC, 
                   color_bounds = c(-3,3), 
                   block.height = 5, 
                   block.width=4, 
                   space.X = 2, 
                   space.Y=2,
                   cex.y = 1,
                   cex.x = 0.75,
                   color_palatte = c("#91cf60","#ffffbf","#fc8d59") )

Here we can see that drug 13 and drug 21 both had strong effects, with opposite direction, that became strongest after 108 or 144 hours respectively.

Reporting the same values as a table is straightforward:

library(knitr)
kable( t(round( rel_AUC, 2)) )

Area under the Timecourse

The standard Dose Response Curve is calculated at a single time point. To compare the relative Area under the Curve for single drug concentration across multiple timepoints, you can call the timecourse_AUC_ratio() function. The AUC for a single sample type/treatment condition across the whole timecourse is calculated using raw data normalized by the vehicle control specified by the user when the dataset was created. AUC is calculated by connecting the normalized data points and then using the trapezoids method for AUC implemented in the trapz() function of the pracma library.

For each treatment, in each concentration, the timecourse_AUC_ratio() function will calculate:

$AUC_{sample1} = \frac{ AUC( sample1_{raw} ) }{ AUC_{vehicle} }$

$AUC_{sample2} = \frac{ AUC( sample2_{raw} ) }{ AUC_{vehicle} }$

$AUC_{ratio} = \frac{ AUC_{sample1} } {AUC_{sample2} }$

pkg = "HTDoseResponseCurve"

fn_map_XML = system.file("extdata", "sample_data_384_platemap.txt",package=pkg)
fn_data_Excel = system.file("extdata", "sample_data_384.xlsx", package = pkg)
plate_map = read_platemap_from_Incucyte_XML( fn_map_XML )
plate_data = read_plates_from_Incucyte_export( fn_data_Excel, "p1", 
                                               number_of_wells=384)
plate_data$hours = round(plate_data$hours)
ds = combine_data_and_map( plate_data, plate_map, negative_control = "DMSO" )
ds$hours=round(ds$hours)

tc=timecourse_AUC_ratio(ds, 
                        treatments=c("drug13"), 
                        sample_types=c("line_1", "line_2"), 
                        concentrations=c(200, 1000, 5000), 
                        summary_method="mean")

 # create a data frame from individual list components
kable(
 data.frame( sample_type = tc$sample_types,
             concentration = tc$concentrations,
             round( t(tc$AUC),2), 
             row.names=1:length(tc$sample_types),
             stringsAsFactors=FALSE), row.names = FALSE)

The timecourse Area under the curve for drug 13 at 5000 nM in line_1 is more than twice the value for line_2, while the timecourse AUC for the same comparison at 200 nM is nearly identical in both lines. This suggests drug_13 is more selective at larger doses.

The timecourse AUC can be calculated quickly for a large number of treatments; here it is calculated for all 20 drugs in this screen, with the results plotted using boxplot_label_outliers(), a wrapper around boxplot() that draws labels on outlier values:

tc=timecourse_AUC_ratio(ds, 
                        treatments=unique(ds$treatment),
                        sample_types=c("line_1", "line_2"), 
                        concentrations=c(200, 1000, 5000), 
                        summary_method="mean")

tc_200=tc$AUC[,tc$concentrations==200 & tc$sample_types=="ratio"]
tc_1000=tc$AUC[,tc$concentrations==1000 & tc$sample_types=="ratio"]
tc_5000=tc$AUC[,tc$concentrations==5000 & tc$sample_types=="ratio"]

layout(matrix(1:3,1,3))
b=boxplot_label_outliers(tc_200, ylim=c(0, 2.5), las=1, main="timecourse AUC, 200 nM", cex=2, pch=19)
b=boxplot_label_outliers(tc_1000, ylim=c(0, 2.5), las=1, main="timecourse AUC, 1000 nM", cex=2, pch=19)
b=boxplot_label_outliers(tc_5000, ylim=c(0, 2.5), las=1, main="timecourse AUC, 5000 nM", cex=2, pch=19)

Synergy analysis

It is often important to determine whether the combined activity of two different drugs exceeds that which would be expected from their individual application. Such "extra" activity is commonly called "synergy".

A very brief synergy primer

The null hypothesis for a synergy analysis is that the effect of combining two or more drugs is additive. Loewe additivity is a commonly used definition of additivity; it essentially says that the effect of two drugs is additive if their combined effect is equivalent to either of the drugs being combined with itself at the same final dosage.

To describe the effect and doses in each condition we can write:

$f_a$, $f_u$: fraction affected, unaffected (by drug)
$(D)1, (D)_2, (D){1,2}$: Dose of drug 1, drug 2, or mixture of drugs 1 and 2.
$(D_m)1, (D_m)_2, (D_m){1,2}$: $IC_{50}$ dose of drug 1, drug 2, mixture of drugs 1, 2.

$D_m$ (the $IC_{50}$) is related to relationship between the drug effect and dose by the equation:

$\frac{f_a}{f_u} = [ \frac{D}{D_m}]^m$

This equation says that, at a given dose D, the relationship between the dose and $D_m$ is proportional to the log of the relationship between the proportion of cells killed and the proportion not yet killed. This relationship can be plotted as a median effect plot, which is useful for quality control. The median effect plot shows the relationship between dose and relative effect on a log-log scale; it restates the dose response curve. The X axis is log( dose ) in uM. The Y axis is the relative proportion of affected samples:

$y = log(\frac{f_a}{f_u})$
$x = log(D)$

The slope m of the median effect plot is determined by the shape of the graph; slope = 1 means a hyperbolic DRC, while m > 1 is sigmoid (“higher order”). A strong linear fit of the median effect plot is associated with a well-behaved sineusoidal DRC.

The line fitting a median effect plot intersects the X axis where $y = log(\frac{f_a}{f_u}) = 0$; since the sum of $f_a$ and $f_u$ is 1, that means the line intersects the X axis when $f_a = f_u = 0.5$; that's in princple the $IC_{50}$. You can therefore-- in princple-- get the $IC_{50}$ for the dose response curve by taking the antilog of D when Y = 0.

The Combination Index/Interaction index method

The Combination Index (CI) is a measure of synergy. To calculate CI, one needs the $IC_{50}$ values for the individual drugs and a series of two-drug exposures that

  1. maintain the same relative proportion of the two drugs
  2. pass through both drug’s $IC_{50}$ values.

The equation for CI is:

$CI = \frac{(D)_1}{(D_m)_1} + \frac{(D)_2}{(D_m)_2}$

Where $D_1$ and $D_2$ are the doses of drugs 1 and 2 when combined that together produce effect m, and $D_{m,1}$ and $D_{m,2}$ are the doses that produce effect m when given one at a time.

Looking for synergy, we can calculate the Combination Index (CI) at various levels of lethality. A CI value <1, =1, and >1 indicate synergism, additive effect, and antagonism.

Values less than one are evidence for synergy; values greater than one are evidence for antagonism.

Confidence intervals for CI (Lee and Kong)

J. Jack Lee and Maiying Kong published a method to calculate calculate confidence intervals for the interaction index (Lee and Kong Statistics in Biopharmaceutical Research 2012). Lee and Kong published R code that will perform this calculation, and I have incorporated this code into HTDoseResponseCurve; all credit goes to Lee and Kong.

Controversies in synergy

The best approach to assessing drug synergy is not clear. The combination index has been subject to criticism from statisticians who advocate non-linear model fitting as superior to the CI method. For an introduction to the terminology and issues, the following references may be helpful:

Articles explaining CI methods and advocating their use:

Articles critical of CI methods

Calculating the Combination Index

Experimental design

HTDoseResponseCurve can calculate and plot synergy values if you have followed a "ray" experimental design, the standard approach recommended for combination index experiments. In this design, one exposes the sample to

  1. to a range of doses for each drug in individually, sufficient to establish the $IC_{50}$ from each drug on its own
  2. to both drugs combined in the same concentration ratio at a range of doses sufficient to establish an $IC_{50}$ for the combination treatment.

Importantly, the two drugs do not have to be given in the same concentration.

You can load the data from an experiment of this design into HTDoseResponseCurve by creating a dataset that includes two vectors of treatments and two vectors of concentrations. To calculate the Combination Index, you must specify the ratio at which each treatment is combined. The concentration value for the combination treatment will equal the the sum of each drug's individual concentrations.

Worked example

Data published in Lee and Kong Statistics in Biopharmaceutical Research 2012 demonstrate the calculation of the interaction between two drugs called SCH66336 and 4-HPR. Their data are reproduced here, along with code that reproduces their Figure 4b.

This figure below plots the interaction index at varying effect strengths for the drug combination (solid dots were observed), with upper and lower 95% confidence intervals plotted with thin dots.

From these plots, Lee and Kong conclude that combinations that produce an effect less than about 0.5 are synergistic at a 95% confidence interval, while there is not evidence that combination concentrations that produce less of an effect are not additive. Note that the figure is plotted with the Y axis on the log scale.

dose_SCH=c(0.1, 0.5, 1, 2, 4)
eff_SCH = c(0.6701, 0.6289, 0.5577, 0.4550, 0.3755)
dose_4HPR = c(0.1, 0.5, 1, 2)
eff_4HPR = c(0.7666, 0.5833, 0.5706, 0.4934)
eff_comb = c(0.6539, 0.4919, 0.3551, 0.2341)
syn = data.frame( 
    treatment_1 = rep("SCH66336", 13),
    conc_1 = c( dose_SCH, rep(0, 4), dose_SCH[1:4]),
    treatment_2 = rep("4-HPR", 13),
    conc_2 = c( rep(0, 5), dose_4HPR, dose_4HPR ),
    values = c(eff_SCH, eff_4HPR, eff_comb ), stringsAsFactors=FALSE )

Raw data

kable(syn)
# Create dataset using create_synergy_dataset
ds_lk = create_synergy_dataset(
                       sample_types = rep("sample_1", 13), 
                       treatments_1 = syn$treatment_1,
                       treatments_2 = syn$treatment_2,
                       concentrations_1 = syn$conc_1,
                       concentrations_2 = syn$conc_2,
                       values = syn$values, negative_control = NA)
layout(matrix(1:2,1,2))

# Calculate median effect
CS_lk = chou_synergy( ds_lk, sample_type = "sample_1", hour=0,
                      treatment_1 = "SCH66336", treatment_2="4-HPR",
                      fct=LL.2(), summary_method="mean" )

me=plot_synergy_median_effect( CS_lk, main="Median Effect" )
legend( -3, 2, c( "SCH66336", "4-HPR", "combined"), 
        col=c("black", "red", "cornflowerblue"), 
        pch=19, cex=0.75)

# Calculate interaction index
ii=plot_synergy_interaction_index(ds=ds_lk, sample_type = "sample_1", 
                               treatment_1 = "SCH66336", treatment_2="4-HPR", 
                               hour=0, 
                               xlab="Effect", ylab="Interaction Index" ,
                               main="Interaction Index")
abline(log(1), 0)
legend( 0, 10, c("interaction", "95% C.I."), lty=c(1, 2), 
        col=c("black", "cornflowerblue"), cex=0.75)

If you prefer, you can plot the same values on the linear scale:

layout(matrix(1:2,1,2))
ii=plot_synergy_interaction_index(ds=ds_lk, sample_type = "sample_1", 
                               treatment_1 = "SCH66336", treatment_2="4-HPR", 
                               hour=0, log = "",
                               xlab="Effect", ylab="Interaction Index",
                               ylim=c(0,3),
                               main="Interaction Index")
abline(1, 0)
legend( 0, 3, c("interaction", "95% C.I."), lty=c(1, 2), 
        col=c("black", "cornflowerblue"), cex=0.75)

Excess over Bliss Synergy

Bliss synergy is calculated using a different null model, one that is based on probabilistic independence. Bliss synergy is reported as deviation (difference) of the observed values for drug combinations from the values predicted by Bliss independence, which takes the form:

${Effect}_{null} = {Effect}_A + {Effect}_B - ({Effect}_A \times {Effect}_B)$

This result is sometimes called the "excess over Bliss independence".

In this example, two drugs are tested at four concentrations (including 0 nM, the vehicle condition). In one scenario, the dataset (DS_add), they have merely additive effects when combined. In an alternate scenario, using with different measured values (DS_syn), there is evidence for synergistic interactions when combined. The Excess over Bliss score is plotted as a heat map using the plot_color_grid() function.

samples = rep("cell_line_1", 16)
t1 = rep( c("DMSO", "drug1", "drug1", "drug1"), 4)
t2 = c( rep( "DMSO", 4), rep("drug2", 12) )
c1 = rep( c(0, 50, 100, 200), 4)
c2 = c(0,0,0,0, 50,50,50,50, 100,100,100,100, 200,200,200,200)
value_ind=c(1,0.8,0.7,0.6,0.8,0.7,0.6,0.5,0.7,0.6,0.5,0.4,0.6,0.5,0.4,0.3)
value_syn=c(1,0.8,0.7,0.6,0.8,0.8,0.5,0.3,0.7,0.3,0.2,0.1,0.6,0.3,0.1,0.05)

DS_add=create_synergy_dataset( sample_types=samples, treatments_1=t1,
                         treatments_2=t2, concentrations=c1, 
                         concentrations_2=c2, values=value_ind, 
                         negative_control="DMSO")
DS_syn=create_synergy_dataset( sample_types=samples, treatments_1=t1,
                         treatments_2=t2, concentrations=c1, 
                         concentrations_2=c2, values=value_syn, 
                         negative_control="DMSO")

b_add = synergy_bliss(DS_add, "drug1", "drug2", sample_type="cell_line_1")
b_syn = synergy_bliss(DS_syn, "drug1", "drug2", sample_type="cell_line_1")
layout(matrix(1:2,1,2))
par(mar=c(4,4,3,1))
gi = plot_color_grid(b_add$excess*100, color_bounds = c(-100,100), 
                main="additive", xlab="nM drug 1", ylab="nM drug 2", 
                plot_values = TRUE)
gs = plot_color_grid(b_syn$excess*100, color_bounds = c(-100,100), 
                main="synergistic", xlab="nM drug 1", ylab="nM drug 2", 
                plot_values = TRUE)

To view excess over bliss synergy more clearly, it may be helpful to hold one drug constant and plot the effect of increasing the concentration of the second drug. Here we hold drug "drug2" constant at either 50, 100, or 200 nM and show individual dose responses as "drug1" is varied. The evidence for synergy is represented by the distance between the combined effect (solid circle) vs. the expected effects of the individual drugs by themselves.

layout(matrix(1:3,3,1))
par(mar=c(4,4,3,1))
concs=c(50, 100, 200)
for( i in 1:3 ){
    plot_additive_vs_synergy_effect( DS_syn, 
            treatment_for_DRC="drug1", concs_for_DRC=c(50, 100, 200), 
            treatment_subgroup="drug2", conc_subgroup=concs[i], 
            hour=0, sample_type="cell_line_1", 
            cex=2, xlab=paste("drug1 effect when drug2=",concs[i],"nM", sep=""))
    legend( 1, 0.95, c("drug1", "drug2", "both"), pch=c(1, 2, 19), bty="n", cex=1.5 )
    box()
}


DavidQuigley/HTDoseResponseCurve documentation built on Jan. 23, 2021, 5:10 a.m.