Quantifying rhythmicity in one condition

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = '#>',
  message = FALSE,
  fig.align = 'center',
  fig.retina = 2)

Introduction

Here we show how to use limorhyde2 to quantify rhythmicity in data from one condition. The data are based on mouse liver samples from the circadian gene expression atlas in mammals (GSE54650).

Load packages

library('data.table')
library('ggplot2')
library('limorhyde2')
library('qs')

# doParallel::registerDoParallel() # register a parallel backend to minimize runtime
theme_set(theme_bw())

Load the data

The expression data are in a matrix with one row per gene and one column per sample. The metadata are in a table with one row per sample. To save time and space, the expression data include only a subset of genes.

y = GSE54650$y
y[1:5, 1:5]

metadata = GSE54650$metadata
metadata

Fit linear models

The first step is to fit a series of linear models based on periodic splines for each genomic feature, in this case each gene, using limma. getModelFit() takes several arguments besides the expression data and metadata, but here we just use the defaults. For example, the data are from one condition, so we leave condColname as NULL. Also, the expression data are from microarrays and already log-transformed, so we leave method as 'trend'.

fit = getModelFit(y, metadata)

Get posterior fits

The next step is obtain posterior estimates of the model coefficients using multivariate adaptive shrinkage (mashr), which learns patterns in the data and accounts for noise in the original fits.

fit = getPosteriorFit(fit)

Get rhythm statistics

We can now use the posterior fits to compute rhythm statistics, i.e. properties of the curve, for each gene.

rhyStats = getRhythmStats(fit)

We can examine the distributions of the statistics in various ways, such as ranking genes by peak-to-trough amplitude (no p-values necessary) or plotting peak-to-trough amplitude vs. peak phase.

print(rhyStats[order(-peak_trough_amp)], nrows = 10L)

ggplot(rhyStats) +
  geom_point(aes(x = peak_phase, y = peak_trough_amp), alpha = 0.2) +
  xlab('Peak phase (h)') +
  ylab('Peak-to-trough amplitude (norm.)') +
  scale_x_continuous(breaks = seq(0, 24, 4))

Get observed and fitted time-courses

We can also compute the expected measurements for one or more genes at one or more time-points, which correspond to the fitted curves. Here we plot the posterior fits and observed expression for three of the top rhythmic genes (converting from gene id to gene symbol).

genes = data.table(
  id = c('13088', '13170', '13869'),
  symbol = c('Cyp2b10', 'Dbp', 'Erbb4'))

measFit = getExpectedMeas(fit, times = seq(0, 24, 0.5), features = genes$id)
measFit[genes, symbol := i.symbol, on = .(feature = id)]
print(measFit, nrows = 10L)

Next we combine the observed expression data and metadata. The curves show how limorhyde2 is able to fit non-sinusoidal rhythms.

measObs = mergeMeasMeta(y, metadata, features = genes$id)
measObs[genes, symbol := i.symbol, on = .(feature = id)]
print(measObs, nrows = 10L)

ggplot() +
  facet_wrap(vars(symbol), scales = 'free_y', nrow = 1) +
  geom_line(aes(x = time, y = value), data = measFit) +
  geom_point(aes(x = time %% 24, y = meas), shape = 21, size = 1.5,
             data = measObs) +
  labs(x = 'Circadian time (h)', y = 'Expression (norm.)') +
  scale_x_continuous(breaks = seq(0, 24, 4))


Try the limorhyde2 package in your browser

Any scripts or data that you put into this service are public.

limorhyde2 documentation built on Feb. 16, 2023, 7:43 p.m.