suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(ggplot2)) library(tidyr) library(purrr) theme_set(theme_bw() + theme(text = element_text(size = 15))) library(impulse) library(tensorflow) auto_config_tf() n_tcs <- 20
This vignette applies much of the functionality included in the impulse package to estimate meaningful parameters of timecourses, compare different kinetic models and visualize results. In order to have some data to work with we will begin by simulating kinetic parameters from sigmoid and impulse models and then generate timecourse data derived from these models. In practice, we wouldn't know these parameters but would instead want to estimate them from observed data. Most of the vignette focuses on how we can estimate these parameters and summarize a best fit model for each timecourse. Since we know the parameters that were used to construct each timecourse, we can also see how accurately we can recover these values,
To have data to work with, r n_tcs
timecourses are randomly generated which include a mixture of sigmoid (single response) and impulse (double sigmoid) curves with associated timing, assmptote and rate (steepness) parameters.
set.seed(1) timecourses <- simulate_timecourses(n = n_tcs) # plot of timecourses + parameters timecourses %>% unnest_legacy(measurements) %>% unite(label, true_model, tc_id) %>% ggplot(aes(x = time)) + geom_point(aes(y = abundance)) + geom_path(aes(y = sim_fit)) + facet_wrap(~ label)
Visualizing these timecourses, sigmoidal fits will always have one and only one response, while impulses allow for up to two transitions (some impulses will look rather sigmoidal due to limitations of the generative process).
We want to evaluate how well each timecourse fits each model (sigmoid and impulse) and for each model what is the parameterization that results in least-squares fit of a response as a function of kinetic parameters to an observed time series. This model can be fit using non-linear least squares and this problem is implemented in TensorFlow to simplify gradient updates and allow multiple initializations to be updated in a single gradient update: estimate_timecourse_params_tf. Using TensorFlow also allows us to not just optimize for a least-squares solution, but we can also constrain parameters to feasible/probable regimes by using prior parameters and find the maximum a posteriori (MAP) estimate. The motivation for using priors and how their values can be chosen is discussed in the setting-priors vignette.
timecourse_parameters <- timecourses %>% unnest_legacy(measurements) %>% # separate by true model nest_legacy(-true_model, .key = "measurements") %>% # fit all models to each timecourse crossing(tibble(model = c("sigmoid", "impulse"))) %>% mutate(timecourse_list = map2(measurements, model, estimate_timecourse_params_tf, n_initializations = 50, fit_intercept = FALSE)) %>% # reduce each timecourse to best fitting model capturing uncertainty around the best minima mutate(best_timecourse_list = map(timecourse_list, reduce_best_timecourse_params, reduction_type = "loss-min")) %>% unnest_legacy(map(best_timecourse_list, ~ as_tibble(map(., list))))
As a quick sanity check, we can compare each timecourse's estimated parameters to the "true" parameters used to generate the data when the true model is the same as the fitted model (i.e., a sigmoid fit to a true sigmoid).
estimated_parameters <- timecourse_parameters %>% filter(true_model == model) %>% unnest_legacy(parameters) %>% select(tc_id, variable, estimated_value = value) true_parameters <- timecourses %>% select(-measurements) %>% unnest_legacy(params) %>% gather(variable, true_value, -true_model, -tc_id) %>% filter(!is.na(true_value)) true_parameters %>% left_join(estimated_parameters, by = c("tc_id", "variable")) %>% ggplot(aes(x = true_value, y = estimated_value)) + geom_point() + facet_wrap(~ true_model + variable, scale = "free", ncol = 5) + geom_abline(intercept = 0, slope = 1, color = "blue")
Here, we can see that timing coefficients and asymptotes are very well estimated for sigmoidal models and generally well estimated for impulses. The occasional impulse which looks very sigmoidal will have unreliable coefficients since it will de facto be a sigmoid. Rate coefficients are difficult to estimate because subtle differences in noise can result in large differences in estimated rate.
In a given dataset we will have some timecourses which fit a sigmoidal models and others that can only be fit with an impulse. To determine which model best fits each timecourse, we first fit the sigmoid and impulse model to every timecourse and then compare the models using impulse_sigmoid_comparison to determine whether the improvement in fit of the impulse model relative to the sigmoid is justified by the two additional parameters of the impulse model.
model_comparison <- timecourse_parameters %>% unnest_legacy(loss) %>% impulse_sigmoid_comparison(fdr_cutoff = 0.001) timecourses %>% select(tc_id, true_model) %>% left_join(model_comparison %>% select(tc_id, best_model), by = "tc_id") %>% count(true_model, best_model) %>% spread(true_model, n, fill = 0)
Finally, we can combine our observed timecourses with fitted estimates based on kinetic parameters using kinetics_plotting.
fitted_kinetics <- timecourse_parameters %>% unnest_legacy(parameters) %>% select(tc_id, model, variable, value) %>% spread(variable, value) %>% nest_legacy(-tc_id, .key = "fitted_kinetics") augmented_timecourses <- timecourses %>% left_join(fitted_kinetics, by = "tc_id") %>% left_join(model_comparison, by = "tc_id") kinetics_plotting(augmented_timecourses, saturation = 0.9, max_time = 150, fit_timepoints = 100)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.