knitr::opts_chunk$set( collapse = TRUE, comment = '#>', message = FALSE, fig.align = 'center', fig.retina = 2)
Here we show how to use limorhyde2
to quantify uncertainty in rhythmicity and differential rhythmicity. This step is not essential and can be computationally expensive, but can provide additional information. As limorhyde2
is a Bayesian approach focused on effect sizes rather than statistical significance, quantifying uncertainty relies on the concepts of posterior probability and credible intervals.
The data are based on liver samples from wild-type and Rev-erb$\alpha/\beta$ double-knockout mice (Cho et al. 2012 and GSE34018).
library('cowplot') library('data.table') library('ggplot2') library('limorhyde2') library('qs') # doParallel::registerDoParallel() # register a parallel backend to minimize runtime theme_set(theme_bw())
The expression data are in a matrix with one row per gene and one column per sample. The metadata are in a table with one row per sample. To save time and space, the expression data include only a subset of genes.
y = GSE34018$y y[1:5, 1:5] metadata = GSE34018$metadata metadata
Because the samples were acquired at relatively low temporal resolution (every 4 h), we use three knots instead of the default four, which reduces the flexibility of the spline curves. We specify condColname
so getModelFit()
knows to fit a differential rhythmicity model.
fit = getModelFit(y, metadata, nKnots = 3L, condColname = 'cond') fit = getPosteriorFit(fit)
The posterior fits consist of not just a single set of model coefficients (the posterior means), but distributions of model coefficients. Sampling from these distributions is the first step to quantifying uncertainty in the fits. Here we generate 50 posterior samples, although an actual analysis would require more to accurately estimate the credible intervals.
fit = getPosteriorSamples(fit, nPosteriorSamples = 50L)
We can use the posterior samples to quantify uncertainty in the expected measurements, i.e., the fitted curves, by specifying the fitType
argument. Here we focus on three genes.
genes = data.table( id = c('13170', '12686', '26897'), symbol = c('Dbp', 'Elovl3', 'Acot1')) times = seq(0, 24, 0.5) measFitSamps = getExpectedMeas( fit, times = times, fitType = 'posterior_samples', features = genes$id) measFitSamps[genes, symbol := i.symbol, on = .(feature = id)] print(measFitSamps, nrows = 10L)
Given the expected measurements from the posterior samples, we can compute the lower and upper bounds of the credible interval for each combination of feature, condition, and time-point. By default, getExpectedMeasIntervals()
calculates the 90% equal-tailed interval.
measFitInts = getExpectedMeasIntervals(measFitSamps) print(measFitInts, nrows = 10L)
It's always a good idea to also calculate the posterior mean fitted curves.
measFitMean = getExpectedMeas(fit, times = times, features = genes$id) measFitMean[genes, symbol := i.symbol, on = .(feature = id)]
Now we can plot the results. In the first row, each curve corresponds to a posterior sample. In the second row, the ribbons indicate the credible intervals.
timeBreaks = seq(0, 24, 4) pal = 'Dark2' p1 = ggplot(measFitSamps) + facet_wrap(vars(symbol), scales = 'fixed', nrow = 1) + geom_line(aes(x = time, y = value, color = cond, group = interaction(cond, posterior_sample)), alpha = 0.2) + labs(x = 'Circadian time (h)', y = 'Expression (norm.)', color = 'Condition') + scale_x_continuous(breaks = timeBreaks) + scale_color_brewer(palette = pal) + theme(legend.position = 'none') p2 = ggplot() + facet_wrap(vars(symbol), scales = 'fixed', nrow = 1) + geom_ribbon(aes(x = time, ymin = lower, ymax = upper, fill = cond), alpha = 0.3, data = measFitInts) + geom_line(aes(x = time, y = value, color = cond), data = measFitMean) + labs(x = 'Circadian time (h)', y = 'Expression (norm.)', color = 'Condition', fill = 'Condition') + scale_x_continuous(breaks = timeBreaks) + scale_fill_brewer(palette = pal) + scale_color_brewer(palette = pal) + theme(legend.position = 'bottom') plot_grid(p1, p2, ncol = 1, rel_heights = c(1, 1.25))
We can also use the posterior samples to quantify uncertainty in the statistics, again by specifying the fitType
argument.
rhyStatsSamps = getRhythmStats(fit, features = genes$id, fitType = 'posterior_samples') diffRhyStatsSamps = getDiffRhythmStats(fit, rhyStatsSamps) diffRhyStatsSamps[genes, symbol := i.symbol, on = .(feature = id)] print(diffRhyStatsSamps, nrows = 10L)
In the plots below, each point represents a posterior sample.
p1 = ggplot(diffRhyStatsSamps) + facet_wrap(vars(symbol), nrow = 1) + geom_point(aes(x = diff_peak_trough_amp, y = diff_mesor), alpha = 0.2) + labs(x = bquote(Delta * 'amplitude (norm.)'), y = bquote(Delta * 'mesor (norm.)')) p2 = ggplot(diffRhyStatsSamps) + facet_wrap(vars(symbol), nrow = 1) + geom_point(aes(x = diff_peak_trough_amp, y = diff_peak_phase), alpha = 0.2) + labs(x = bquote(Delta * 'amplitude (norm.)'), y = bquote(Delta * 'phase (h)')) plot_grid(p1, p2, ncol = 1)
Finally, we can compute credible intervals for the rhythm and differential rhythm statistics. Again, by default these are 90% equal-tailed intervals. Currently getStatsIntervals()
does not calculate intervals for phase-based statistics, since phase and phase difference are circular quantities.
rhyStatsInts = getStatsIntervals(rhyStatsSamps) print(rhyStatsInts, nrows = 10L) diffRhyStatsInts = getStatsIntervals(diffRhyStatsSamps) print(diffRhyStatsInts, nrows = 10L)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.