The Model

Let ${ y_i }_{i=1}^n$ denote the coverage at position $i$ for $n$ loci across the chromosome/genome.

Let ${ s_i }_{i=1}^n$ denote the segment number for each of those locations where $s_i \in { 1, \dots, S }$ and $S$ is the total number of segments.

\begin{align} p(y_i | s_i = j, w, \mu, \sigma^2 ) = (1-\rho_{j}) \underbrace{\sum_{c=1}^{C} w_{0,c} f( y_i; \mu_{0,c}, \sigma_{0,c}^2 )}{\text{Common}} + \rho{j} \underbrace{ \sum_{k=1}^{K} w_{j,k} f( y_i; \mu_{j,k}, \sigma_{j,k}^2 ) }{\text{Segment-specific}} \end{align} where $f$ is the density function for the Normal distribution, $\sum{k=1}^K w_{j,k} = 1$ and $\sum_{c=1}^{C} w_{0,c} = 1$.

EM Updates

E updates

For each data point $y_i$, $\psi_{i}$ gives the probability it is in a common (rather than segment specific) component.

\begin{align} \psi_{i} = \frac{ (1-\rho_{j}) \sum_{c=1}^{C} w_{0,c} f( y_i; \mu_{0,c}, \sigma_{0,c}^2 ) } { (1-\rho_{j}) \sum_{c=1}^{C} w_{0,c} f( y_i; \mu_{0,c}, \sigma_{0,c}^2 ) + \rho_{j} \sum_{k=1}^{K} w_{j,k} f( y_i; \mu_{j,k}, \sigma_{j,k}^2 ) } \end{align}

For each data point $y_i$, $\phi_{i,c}$ gives the probability it is in component $c$ given it's in a common component. $\nu_{i,k}$ gives the equivalent for segment specific components.

$$ \phi_{i,c} = \frac{ w_{0,c} f(y_i;\mu_{0, c}, \sigma_{0,c}^2 ) } { \sum_{c=1}^{C} w_{0,c} f(y_i;\mu_{0,c}, \sigma_{0,c}^2 ) } , $$

$$ \nu_{i,k} = \frac{ w_{s_i,k} f(y_i;\mu_{s_i,k}, \sigma_{s_i,k}^2 ) } { \sum_{k=1}^{K} w_{s_i,k} f(y_i;\mu_{s_i,k}, \sigma_{s_i,k}^2 ) }
$$

M updates

M update for $\rho$, a global common vs segment specific weighting:

$$ \rho_{j} = 1 - \frac{ \sum_{i : s_i = j} \psi_i }{ \sum_{i : s_i = j} 1 } $$

M updates for common component parameters:

$$ w_{0,c} = \frac{ \sum_{i=1}^n \psi_i \phi_{i,c} } { \sum_{i=1}^n \psi_i } $$

$$ \mu_{0,c} = \frac{ \sum_{i=1}^n \psi_i \phi_{i,c} y_i } { \sum_{i=1}^n \psi_i \phi_{i,c} } $$

$$ \sigma_{0,c}^2 = \frac{ \sum_{i=1}^n \psi_i \phi_{i,c} ( y_i - \mu_{0,c} )^2 } { \sum_{i=1}^n \psi_i \phi_{i,c} } $$

M updates for segment specific component parameters:

$$ w_{j,k} = \frac{ \sum_{i : s_i = j } (1-\psi_i) \nu_{i,k} } { \sum_{i : s_i = j} (1-\psi_i) } $$

$$ \mu_{j,k} = \frac{ \sum_{i : s_i = j } (1-\psi_i) \nu_{i,k} y_i } { \sum_{i : s_i = j} (1-\psi_i) \nu_{i,k} } $$

$$ \sigma_{j,k}^2 = \frac{ \sum_{i : s_i = j } (1-\psi_i) \nu_{i,k} ( y_i - \mu_{j,k})^2 } { \sum_{i : s_i = j} (1-\psi_i) \nu_{i,k} } $$



daniel-wells/comixr documentation built on May 14, 2019, 3:39 p.m.