\providecommand{\e}{\varepsilon} \providecommand{\nv}{{}^{-1}} \providecommand{\ov}[1]{\overline{#1}} \providecommand{\q}{$\quad$ \newline} \providecommand{\rt}{\rightarrow} \providecommand{\vc}[1]{\boldsymbol{#1}} \providecommand{\wh}[1]{\widehat{#1}}

Introduction

The fbseq package fits a hierarchical model to RNA-sequencing (RNA-seq) count data in fully Bayesian fashion. Publication of the method is pending. For now, here is the model.

The hierarchical model

Let $y_{n, g}$ be the fully preprocessed RNA-sequencing read count for replicate (dat column) $n$ ($n = 1, \ldots, N)$ and gene (data row) $g$ ($g = 1, \ldots, G)$. Let $X$ be the $N \times L$ design matrix for gene-specific model coefficient parameters $\boldsymbol{\beta}g = (\beta{1, g}, \ \ldots, \ \beta_{L, g})$. Let $\boldsymbol{X}n$ be the $n$'th row of $X$. Conditioned on the parameters $\boldsymbol{\beta}{ g}$ and $\e_{n, g}$, the $y_{n, g}$'s are treated as independent and Poisson-distributed in the likelihood.

$$ \begin{align} y_{n,g} \ | \ \boldsymbol{\beta}g, \ \e{n, g} \stackrel{\text{ind}}{\sim} \text{Poisson} \left (\exp \left (h_n + \e_{n, g} + \boldsymbol{X}_n \boldsymbol{\beta}_g \right ) \right ) \end{align} $$

Signal

The parameters of interest are the $\beta_{\ell, g}$'s and their hyperparameters. Conditional on hyperparameters $\theta_\ell$, $\sigma_\ell^2$, and $\xi_{\ell, g}$, the $\beta_{\ell, g}$'s are independent with normal distributions.

$$ \begin{align} \beta_{\ell, g} \ | \ \theta_\ell, \ \sigma_\ell^2, \ \xi_{\ell, g} \stackrel{\text{ind}}{\sim} \text{Normal}(\theta_\ell, \ \sigma_\ell^2 \xi_{\ell, g}) \end{align} $$

The design matrix $X$ should be chosen to substantiate the conditional independence assumptions for the $\beta_{\ell, g}$'s in the model.

The hierarchical means $\theta_\ell$ of $\beta_{\ell, 1}, \beta_{\ell, G}$ are given normal priors. Conditional on the initialization constants $c_\ell^2$, the $\theta_\ell$'s are assumed to be independent.

$$ \begin{align} \theta_\ell \ | \ c_\ell^2 \stackrel{\text{ind}}{\sim} \text{Normal}(0, \ c_\ell^2) \end{align} $$

The $c_\ell$'s are constants and should be large so that the priors on the $\theta_\ell$'s are diffuse and thus less informative than otherwise.

The $\sigma_\ell$ parameters are assumed to be independent conditional on initialization constants $s_\ell$.

$$ \begin{align} \sigma_\ell \ | \ s_\ell \stackrel{\text{ind}}{\sim} \text{Uniform}(0, \ s_\ell^2) \end{align} $$

This prior is equivalent to a $\sigma_\ell^{-1} \text{I}(\sigma_\ell < s_\ell)$ prior on $\sigma_\ell^2$. The $s_\ell$ constants should be chosen to be large.

The $\xi_{\ell, g}$ parameters are auxiliary variables used to assign different marginal hierarchical distributions to the $\beta_{\ell, g}$'s. Let the the $\xi_{\ell, g}$'s be conditionally independent given constants $k_\ell$, $q_\ell$, and $r_\ell$, and let

$$ \begin{align} \xi_{\ell, g} \stackrel{\text{ind}}{\sim} p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ \ r_\ell) \end{align} $$

If $p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ r_\ell) = I(\xi_{\ell, g} = 1)$, then $\beta_{\ell, g}$ have conditionally independent normal distributions (default setting of fbseq ). If $p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ r_\ell) = Exp(\text{rate} = k_\ell)$, then the $\beta_{\ell, g}$'s are independent with Laplace distributions given $\theta_\ell$ and $\sigma_\ell^2$ (default: mean $\theta_\ell$ and variance $\sigma_\ell^2$). If $p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ r_\ell) = \text{Inverse-Gamma}(q_\ell, \ r_\ell)$, then the $\beta_{\ell, g}$'s are independent with Student-$t$ distributions given $\theta_\ell$ and $\sigma_\ell^2$ (default: mean $\theta_\ell$ and variance $\sigma_\ell^2$). If $p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ r_\ell) = \text{Half-Cauchy}(0, 1)$, then the $\beta_{\ell, g}$'s are independent with horseshoe distributions given $\theta_\ell$ and $\sigma_\ell^2$. For each $\ell$ separately, the user can choose among normal, Laplace, Student-$t$, and horseshoe priors on the $\beta_{\ell, g}$'s.

Noise

The $h_n$ terms are constants estimated from the data before the MCMC. These play the role of customary log-scale RNA-seq normalization factors, accounting for replicate-specific nuisance effects such as different sequencing depths.

The $\e_{n, g}$ terms are noise that the Poisson distribution does not account for. Conditional on their variances $\gamma_g$, the $\e_{n, g}$ parameters are independent with normal distributions.

$$ \begin{align} \e_{n,g} \ | \ \gamma_g \stackrel{\text{ind}}{\sim} \text{Normal}(0, \ \gamma_g) \end{align} $$

The $\gamma_g$'s are analogous to the negative binomial dispersions in more traditional RNA-seq models from packages like edgeR. Conditioned on parameters $\nu$ and $\tau$, they have independent inverse-gamma distributions.

$$ \begin{align} \gamma_g \ | \ \nu, \ \tau \stackrel{\text{ind}}{\sim} \text{Inverse-Gamma} \left ( \frac{\nu}{2}, \ \frac{\nu \tau}{2} \right ) \end{align} $$

We can interpret $\nu$ as the degree to which the $\gamma_g$'s "shrink" towards $\tau$. Given the initialization constant $d$, $\nu$ has a uniform prior distribution

$$ \begin{align} \nu \sim \text{Uniform}(0, \ d) \end{align} $$

And $\tau$ is the prior center of the $\gamma_g$'s (between the prior mean and prior mode). Given initialization constants $a$ and $b$, $\tau$ has a gamma prior.

$$ \begin{align} \tau \sim \text{Gamma}(a, \ \text{rate} = b) \end{align} $$

Model summary

$$ \begin{align} &y_{n,g} \ | \ \boldsymbol{\beta}g, \ \e{n, g} &&\stackrel{\text{ind}}{\sim} \text{Poisson} \left (\exp \left (h_n + \e_{n, g} + \boldsymbol{X}n \boldsymbol{\beta}_g \right ) \right ) \ &\qquad \beta{\ell, g} \ | \ \theta_\ell, \ \sigma_\ell^2, \ \xi_{\ell, g} &&\stackrel{\text{ind}}{\sim} \text{Normal}(\theta_\ell, \ \sigma_\ell^2 \xi_{\ell, g}) \ &\qquad \qquad \theta_\ell \ | \ c_\ell^2 &&\stackrel{\text{ind}}{\sim} \text{Normal}(0, \ c_\ell^2) \ &\qquad \qquad \sigma_\ell \ | \ s_\ell &&\stackrel{\text{ind}}{\sim} \text{Uniform}(0, \ s_\ell^2) \ &\qquad \qquad \xi_{\ell, g} &&\stackrel{\text{ind}}{\sim} p(\xi_{\ell, g} \ | \ k_\ell, \ q_\ell, \ r_\ell) \ &\qquad \e_{n,g} \ | \ \gamma_g &&\stackrel{\text{ind}}{\sim} \text{Normal}(0, \ \gamma_g) \ &\qquad \qquad \gamma_g \ | \ \nu, \ \tau &&\stackrel{\text{ind}}{\sim} \text{Inverse-Gamma} \left ( \frac{\nu}{2}, \ \frac{\nu \tau}{2} \right ) \ &\qquad \qquad \qquad \nu &&\sim \text{Uniform}(0, \ d) \ &\qquad \qquad \qquad \tau &&\sim \text{Gamma}(a, \ \text{rate} = b) \end{align} $$

Inference

The fbseq package estimates posterior probabilities that depend on logical conjunctions of linear combinations of the $\beta_{\ell,g}$ parameters. Possible examples are below.

$$ \begin{align} &P\left( \left . \beta_{1, g} > 0 \ \right | \ \text{data} \right) \ &P\left( \left . \beta_{2, g} - \beta_{3, g} > 1 \ \right | \ \text{data} \right) \ &P\left( \left . \beta_{1, g} - \beta_{3, g} > 0 \text{ and } \beta_{2, g} - \beta_{3, g} > \sqrt{2} \text{ and } \beta_{4, g} > -\log(\pi) \ \right | \ \text{data} \right) \ \end{align} $$



wlandau/fbseq documentation built on May 4, 2019, 8:43 a.m.