In calico/impulse: Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
library(tidyr)
library(purrr)
theme_set(theme_bw() + theme(text = element_text(size = 15)))

library(impulse)
library(tensorflow)
auto_config_tf()

n_tcs <- 20

Overview

This vignette applies much of the functionality included in the impulse package to estimate meaningful parameters of timecourses, compare different kinetic models and visualize results. In order to have some data to work with we will begin by simulating kinetic parameters from sigmoid and impulse models and then generate timecourse data derived from these models. In practice, we wouldn't know these parameters but would instead want to estimate them from observed data. Most of the vignette focuses on how we can estimate these parameters and summarize a best fit model for each timecourse. Since we know the parameters that were used to construct each timecourse, we can also see how accurately we can recover these values,

Simulate timecourses

To have data to work with, r n_tcs timecourses are randomly generated which include a mixture of sigmoid (single response) and impulse (double sigmoid) curves with associated timing, assmptote and rate (steepness) parameters.

set.seed(1)
timecourses <- simulate_timecourses(n = n_tcs)

# plot of timecourses + parameters
timecourses %>%
  unnest_legacy(measurements) %>%
  unite(label, true_model, tc_id) %>%
  ggplot(aes(x = time)) +
  geom_point(aes(y = abundance)) +
  geom_path(aes(y = sim_fit)) +
  facet_wrap(~ label)

Visualizing these timecourses, sigmoidal fits will always have one and only one response, while impulses allow for up to two transitions (some impulses will look rather sigmoidal due to limitations of the generative process).

Estimate best fitting sigmoid and impulse model for each timecourse using TensorFlow

We want to evaluate how well each timecourse fits each model (sigmoid and impulse) and for each model what is the parameterization that results in least-squares fit of a response as a function of kinetic parameters to an observed time series. This model can be fit using non-linear least squares and this problem is implemented in TensorFlow to simplify gradient updates and allow multiple initializations to be updated in a single gradient update: estimate_timecourse_params_tf. Using TensorFlow also allows us to not just optimize for a least-squares solution, but we can also constrain parameters to feasible/probable regimes by using prior parameters and find the maximum a posteriori (MAP) estimate. The motivation for using priors and how their values can be chosen is discussed in the setting-priors vignette.

timecourse_parameters <- timecourses %>%
  unnest_legacy(measurements) %>%
  # separate by true model
  nest_legacy(-true_model, .key = "measurements") %>%
  # fit all models to each timecourse
  crossing(tibble(model = c("sigmoid", "impulse"))) %>%
  mutate(timecourse_list = map2(measurements, model, estimate_timecourse_params_tf, n_initializations = 50, fit_intercept = FALSE)) %>%
  # reduce each timecourse to best fitting model capturing uncertainty around the best minima
  mutate(best_timecourse_list = map(timecourse_list, reduce_best_timecourse_params, reduction_type = "loss-min")) %>%
  unnest_legacy(map(best_timecourse_list, ~ as_tibble(map(., list))))

As a quick sanity check, we can compare each timecourse's estimated parameters to the "true" parameters used to generate the data when the true model is the same as the fitted model (i.e., a sigmoid fit to a true sigmoid).

estimated_parameters <- timecourse_parameters %>%
  filter(true_model == model) %>%
  unnest_legacy(parameters) %>%
  select(tc_id, variable, estimated_value = value)

true_parameters <- timecourses %>%
  select(-measurements) %>%
  unnest_legacy(params) %>%
  gather(variable, true_value, -true_model, -tc_id) %>%
  filter(!is.na(true_value))

true_parameters %>%
  left_join(estimated_parameters, by = c("tc_id", "variable")) %>%
  ggplot(aes(x = true_value, y = estimated_value)) +
  geom_point() +
  facet_wrap(~ true_model + variable, scale = "free", ncol = 5) +
  geom_abline(intercept = 0, slope = 1, color = "blue")

Here, we can see that timing coefficients and asymptotes are very well estimated for sigmoidal models and generally well estimated for impulses. The occasional impulse which looks very sigmoidal will have unreliable coefficients since it will de facto be a sigmoid. Rate coefficients are difficult to estimate because subtle differences in noise can result in large differences in estimated rate.

Model comparison

In a given dataset we will have some timecourses which fit a sigmoidal models and others that can only be fit with an impulse. To determine which model best fits each timecourse, we first fit the sigmoid and impulse model to every timecourse and then compare the models using impulse_sigmoid_comparison to determine whether the improvement in fit of the impulse model relative to the sigmoid is justified by the two additional parameters of the impulse model.

model_comparison <- timecourse_parameters %>%
  unnest_legacy(loss) %>%
  impulse_sigmoid_comparison(fdr_cutoff = 0.001) 

timecourses %>%
  select(tc_id, true_model) %>%
  left_join(model_comparison %>%
              select(tc_id, best_model),
            by = "tc_id") %>%
  count(true_model, best_model) %>%
  spread(true_model, n, fill = 0)

Visualization comparison

Finally, we can combine our observed timecourses with fitted estimates based on kinetic parameters using kinetics_plotting.

fitted_kinetics <- timecourse_parameters %>%
  unnest_legacy(parameters) %>%
  select(tc_id, model, variable, value) %>%
  spread(variable, value) %>%
  nest_legacy(-tc_id, .key = "fitted_kinetics")

augmented_timecourses <- timecourses %>%
  left_join(fitted_kinetics, by =  "tc_id") %>%
  left_join(model_comparison, by = "tc_id")

kinetics_plotting(augmented_timecourses, saturation = 0.9, max_time = 150, fit_timepoints = 100)

calico/impulse documentation built on June 4, 2024, 5:28 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

calico/impulse
Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow

In calico/impulse: Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow

Overview

Simulate timecourses

Estimate best fitting sigmoid and impulse model for each timecourse using TensorFlow

Model comparison

Visualization comparison

R Package Documentation

Browse R Packages

We want your feedback!

calico/impulse Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow

In calico/impulse: Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow

Overview

Simulate timecourses

Estimate best fitting sigmoid and impulse model for each timecourse using TensorFlow

Model comparison

Visualization comparison

R Package Documentation

Browse R Packages

We want your feedback!

calico/impulse
Fit Impulse and Sigmoidal Curves to Longitudinal Data Using TensorFlow