fit_drc_4p | R Documentation |
Function for fitting four-parameter dose response curves for each group (precursor, peptide or protein). In addition it can annotate data based on completeness, the completeness distribution and statistical testing using ANOVA. Filtering by the function is only performed based on completeness if selected.
fit_drc_4p(
data,
sample,
grouping,
response,
dose,
filter = "post",
replicate_completeness = 0.7,
condition_completeness = 0.5,
n_replicate_completeness = NULL,
n_condition_completeness = NULL,
complete_doses = NULL,
anova_cutoff = 0.05,
correlation_cutoff = 0.8,
log_logarithmic = TRUE,
include_models = FALSE,
retain_columns = NULL
)
If data filtering options are selected, data is annotated based on multiple criteria.
If "post"
is selected the data is annotated based on completeness, the completeness distribution, the
adjusted ANOVA p-value cutoff and a correlation cutoff. Completeness of features is determined based on
the n_replicate_completeness
and n_condition_completeness
arguments. The completeness distribution determines
if there is a distribution of not random missingness of data along the dose. For this it is checked if half of a
features values (+/-1 value) pass the replicate completeness criteria and half do not pass it. In order to fall into
this category, the values that fulfill the completeness cutoff and the ones that do not fulfill it
need to be consecutive, meaning located next to each other based on their concentration values. Furthermore,
the values that do not pass the completeness cutoff need to be lower in intensity. Lastly, the difference
between the two groups is tested for statistical significance using a Welch's t-test and a
cutoff of p <= 0.1 (we want to mainly discard curves that falsely fit the other criteria but that
have clearly non-significant differences in mean). This allows curves to be considered that have
missing values in half of their observations due to a decrease in intensity. It can be thought
of as conditions that are missing not at random (MNAR). It is often the case that those entities
do not have a significant p-value since half of their conditions are not considered due to data
missingness. The ANOVA test is performed on the features by concentration. If it is significant it is
likely that there is some response. However, this test would also be significant even if there is one
outlier concentration so it should only be used only in combination with other cutoffs to determine
if a feature is significant. The passed_filter
column is TRUE
for all the
features that pass the above mentioned criteria and that have a correlation greater than the cutoff
(default is 0.8) and the adjusted ANOVA p-value below the cutoff (default is 0.05).
The final list is ranked based on a score calculated on entities that pass the filter.
The score is the negative log10 of the adjusted ANOVA p-value scaled between 0 and 1 and the
correlation scaled between 0 and 1 summed up and divided by 2. Thus, the highest score an
entity can have is 1 with both the highest correlation and adjusted p-value. The rank is
corresponding to this score. Please note, that entities with MNAR conditions might have a
lower score due to the missing or non-significant ANOVA p-value. If no score could be calculated
the usual way these cases receive a score of 0. You should have a look at curves that are TRUE
for dose_MNAR
in more detail.
If the "pre"
option is selected for the filter
argument then the data is filtered for completeness
prior to curve fitting and the ANOVA test. Otherwise annotation is performed exactly as mentioned above.
We recommend the "pre"
option because it leaves you with not only the likely hits of your treatment, but
also with rather high confidence true negative results. This is because the filtered data has a high
degree of completeness making it unlikely that a real dose-response curve is missed due to data missingness.
Please note that in general, curves are only fitted if there are at least 5 conditions with data points present to ensure that there is potential for a good curve fit. This is done independent of the selected filtering option.
If include_models = FALSE
a data frame is returned that contains correlations
of predicted to measured values as a measure of the goodness of the curve fit, an associated
p-value and the four parameters of the model for each group. Furthermore, input data for plots
is returned in the columns plot_curve
(curve and confidence interval) and plot_points
(measured points). If include_models = TURE
, a list is returned that contains:
fit_objects
: The fit objects of type drc
for each group.
correlations
: The correlation data frame described above
# Load libraries
library(dplyr)
set.seed(123) # Makes example reproducible
# Create example data
data <- create_synthetic_data(
n_proteins = 2,
frac_change = 1,
n_replicates = 3,
n_conditions = 8,
method = "dose_response",
concentrations = c(0, 1, 10, 50, 100, 500, 1000, 5000),
additional_metadata = FALSE
)
# Perform dose response curve fit
drc_fit <- fit_drc_4p(
data = data,
sample = sample,
grouping = peptide,
response = peptide_intensity_missing,
dose = concentration,
n_replicate_completeness = 2,
n_condition_completeness = 5,
retain_columns = c(protein, change_peptide)
)
glimpse(drc_fit)
head(drc_fit, n = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.