The EM GLM algorithm - Proof of predicted values"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 8*0.62
)

The EM algorithm treats the observed data as arising from one of several competing random processes. Each process has some underpinning probability distribution:

$$P(y_i | x_i, B_i)$$

and likelihood of all observations:

$$ L(Y | X, B_1 ... B_k) = \prod_{i=1}^N \prod_{j=1}^k P(y_i | x_i, B_j )^{I(Z_i = j)} $$

where each observation has a hidden variable, $Z_i$ (indicating which of the competing random processes gives rise to the observation) and the function $I(Z_i = j)$ is an indicator function (i.e. it equals 1 when true and 0 otherwise). If $Z_i$ were known the observations could be grouped and $k$ models fit so as to maximize the log-likelihood:

$$ l(Y | X, B_1 ... B_k) = \sum_{i=1}^N \sum_{j=1}^k I(Z_i = j) \ln(P(y_i | x_i, B_j )) $$

Given that $Z_i$ is not known, we instead substitute $Z_i$ for a probability $T_{j}$. $T_{j}$ is the probability of a given observation arising from one of the $k$ competing models (assuming the proposed parameters are true) normalized across all models. Making this substitution, the likelihood of the $i^{th}$ observation, $l_i$ is:

$$ l_i(y | x, B_1 ... B_k, T_1 ... T_k) = \sum_{j=1}^k T_{j} \ln(P(y | x, B_j)) $$

with $\sum_{j=1}^k T_{i,j} = 1$. For proposed parameters, $B_1 ... B_k$, $T_j$ has values:

$$ T_{j} = \frac{P(y | x, B_j)}{\sum_{j=1}^k P(y | x, B_k) } $$

Using this replacement of $I(Z_i = j)$ with $T_{j}$ each observation becomes a mixed distribution:

$$P(y| x, B_1 .. B_K, T_1 ... T_k) = \alpha \prod_{j=1}^k P(y | x, B_j) ^ {T_{j}}$$

with $\alpha$ being some additional normalization term.

Binomial example:

For the Binomial distribution the mixed probability mass function is:

$$ P(y | p_1 ... p_k, T_1 ..T_k) = f_{norm.}(n, p1 ... p_k) \prod_{i=1}^k \left({n \choose y} p_i^y(1-p)^{n-y}\right)^{T_i} $$

where $f_{norm.}$ is a normalization coefficient to be found.

Re-arranging the powers we can re-write the equation as:

$$ P(y | p_1 ... p_k, T_1 ..T_k) = f_{norm.}(n, p_1.. p_n, T_1 ... T_n) {n \choose y} \left( \prod_{i=1}^k p_i^{T_i} \right)^y \left(\prod_{i=1}^k (1-p_i)^{T_i}\right)^{n - y} $$

Which suddenly resembles the standard binomial process. Using the binomial theorem we know that:

$$\sum_{y=0}^n {n \choose y} \left( \prod_{i=1}^k p_i^{T_i} \right)^y \left(\prod_{i=1}^k (1-p_i)^{T_i}\right)^{n - y} = (\prod_{i=1}^k p_i^{T_i} + \prod_{i=1}^k (1-p_i)^{T_i})^n $$

and hence we have the normalization term:

$$f_{norm.}(n, p_1.. p_n, T_1 ... T_n) = (\prod_{i=1}^k p_i^{T_i} + \prod_{i=1}^k (1-p_i)^{T_i})^{-n}$$

And our original expression for $P(y)$ can be rewritten as:

$$ P(y | p_1 ... p_k, T_1 ..T_k) = {n \choose y} \frac{\left( \prod_{i=1}^k p_i^{T_i} \right)^y \left(\prod_{i=1}^k (1-p_i)^{T_i}\right)^{n - y}}{(\prod_{i=1}^k p_i^{T_i} + \prod_{i=1}^k (1-p_i)^{T_i})^{n}}$$

From here we can re-write in terms of $p_{pooled}$ where:

$$p_{pooled} = \frac{\prod_{i=1}^k p_i^{T_i} }{\prod_{i=1}^k p_i^{T_i} + \prod_{i=1}^k (1-p_i)^{T_i}} $$

$$(1 - p_{pooled}) = \frac{\prod_{i=1}^k (1 - p_i)^{T_i} }{\prod_{i=1}^k p_i^{T_i} + \prod_{i=1}^k (1-p_i)^{T_i}}$$

$$ P(y| p_{pooled}) = {n \choose y} p_{pooled}^y (1- p_{pooled})^{n - y} $$

now - if we use the canonical link function (such that $\theta = xB$) we have:

$$\theta_{pooled} = log\left(\frac{p_{pooled}}{1 - p_{pooled}}\right) = log\left(\frac{\prod_{i=1}^k p_i^{T_i} }{\prod_{i=1}^k ( 1- p_i)^{T_i} }\right) = log\left( \prod_{i=1}^k \left( \frac{p_i}{1-p_i}\right)^{T_i} \right) = \sum_{i=1}^k T_i \theta_i = \sum_{i=1}^k T_i x B_i $$

From here the standard rules of GLMs can be applied, with $\mu = b'(\theta_{pooled}).$

Poisson derivation

For the Poisson distribution the mixed probability mass function is:

$$ P(y |\lambda_1 ... \lambda_k, T_1 ..T_k) = f_{norm.}(n, \lambda_1 ... \lambda_k) \prod_{i=1}^k \left(\frac{\lambda_i^y}{y!}\right)^{T_i} $$

where $f$ is again the normalization function. Following the same approach as above, we can re-arrange the powers:

$$ P(y |\lambda_1 ... \lambda_k, T_1 ..T_k) = f_{norm.}(n,\lambda_1 ... \lambda_k) \frac{\left( \prod_{i=1}^k \lambda_i^{T_i} \right)^y }{y!} $$

which follows the same form as the typical Poisson dist. with a value of:

$$ \lambda_{pooled} = \prod_{i=1}^k \lambda_i^{T_i} $$

$$ P(y | \lambda_1 ... \lambda_k, T_1 ..T_k) = \frac{e^{-\lambda_{pooled}} \lambda_{pooled}}{y!} $$

Now, we can again apply the canonical link function, $\theta_i = log(\lambda_i)$ and $\lambda_i = e^{\theta_i}$ with $\theta_i = xB_i$:

$$\theta_{pooled} = \log(\lambda_{pooled}) = \log \left(\prod_{i=1}^k \lambda_i^{T_i} \right) = \sum_{i=1}^k T_i \log(\lambda_i) =\sum_{i=1}^k T_i x B_i $$

Discussion

By using the canonical link both the Poisson and Binomial distribution result in the same link between $\theta_{pooled}$ and the EM_GLM parameters $B_i$.

With this proof, we can then assume that the standard GLM results from $b(\theta)$ will hold, simplifying prediction and residuals.



Try the emax.glm package in your browser

Any scripts or data that you put into this service are public.

emax.glm documentation built on July 4, 2019, 5:04 p.m.