poisson.glm.mix: Estimation of high dimensional Poisson GLMs via EM algorithm.

poisson.glm.mixR Documentation

Estimation of high dimensional Poisson GLMs via EM algorithm.

Description

This package can be used to cluster high dimensional count data under the presence of covariates. A mixture of Poisson Generalized Linear models (GLM's) is proposed. Conditionally to the covariates, Poisson multivariate distribution describing each cluster is a product of independent Poisson distributions. Different parameterizations for the slopes are proposed. Case of partioning the response variables into a set of replicates is considered. Poisson GLM mixture is estimated via Expectation Maximization (EM) algorithm with Newton-Raphson steps. An efficient initialization of EM algorithm is proposed to improve parameter estimation. It is a splitting scheme which is combined with a Small EM strategy. The user is referred to the function pois.glm.mix for an automatic evaluation of the proposed methodology.

Details

Package: poisson.glm.mix
Type: Package
Version: 1.4
Date: 2023-08-19

Assume that the observed data can be written as y = (y_{1},\ldots,y_{n}) where y_i=\{y_{ij\ell};j = 1, \ldots,J,\ell = 1, \ldots,L_{j}\}, y_i\in Z_+^{d}, i = 1,\ldots,n, with d = \sum_{j=1}^{J}L_{j} and L_j \geq 1, j=1,\ldots,J. Index i denotes the observation, while the vector L=(L_1,\ldots,L_J) defines a partition of the d variables into J blocks: the first block consists of the first L_1 variables, the second block consists of the next L_2 variables and so on. We will refer to j and \ell using the terms “condition” and “replicate”, respectively. In addition to y, consider that a vector of V covariates is observed, denoted by x_{i} := \{x_{iv};v=1,\ldots,V\}, for all i = 1, \ldots,n. Assume now that conditional to x_{i}, a model indicator m taking values in the discrete set \{1,2,3\} and a positive integer K, the response y_{i}, is a realization of the corresponding random vector

Y_{i}|x_{i}, m\sim \sum_{k = 1}^{K}\pi_{k}\prod_{j=1}^{J}\prod_{\ell=1}^{L_{j}}\mathcal P(\mu_{ij\ell k;m})

where \mathcal P denotes the Poisson distribution. The following parameterizations for the Poisson means \mu_{ij\ell k;m} are considered: If m=1 (the “\beta_{jk}” parameterization), then

\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{jkv}x_i.

If m=2 (the “\beta_{j}” parameterization), then

\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{jv}x_i.

If m=3 (the “\beta_{k}” parameterization), then

\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{kv}x_i.

For identifiability purposes assume that \sum_{\ell=1}^{L_j}\gamma_{j\ell}=0, j=1,\ldots,J.

Author(s)

Papastamoulis Panagiotis Maintainer: Papastamoulis Panagiotis <papapast@yahoo.gr>

References

Papastamoulis, P., Martin-Magniette, M. L., & Maugis-Rabusseau, C. (2016). On the estimation of mixtures of Poisson regression models with large number of components. Computational Statistics & Data Analysis, 93, 97-106.

Examples

## load a small dataset of 500 observations
data("simulated_data_15_components_bjk")
## in this example there is V = 1 covariates (x)
##   and d = 6 response variables (y). The design is
##   L = (3,2,1).
V <- 1
x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V))
y <- sim.data[,-1]

## We will run the algorithm using parameterization
##   m = 1 and the number of components in the set
##   {2,3,4}.

rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1, 
                  max.iter=1000, Kmin=2, Kmax= 4, 
                  m1=3, m2=3, t1=3, t2=3, msplit=4, tsplit=3,mnr = 5)

# note: useR should specify larger values for Kmax, m1, m2, t1,
#	 t2, msplit and tsplit for a complete analysis.

# retrieve the selected models according to BIC or ICL
rr$sel.mod.icl
rr$sel.mod.bic
# retrieve the estimates according to ICL
# alpha
rr$est.sel.mod.icl$alpha
# beta
rr$est.sel.mod.icl$beta
# gamma
rr$est.sel.mod.icl$gamma
# pi
rr$est.sel.mod.icl$pi
# frequency table with estimated clusters
table(rr$est.sel.mod.icl$clust)
# histogram of the maximum conditional probabilities
hist(apply(rr$est.sel.mod.icl$tau,1,max),30)

##(the full data of 5000 observations can be loaded using 
##     data("simulated_data_15_components_bjk_full")


poisson.glm.mix documentation built on Aug. 19, 2023, 9:06 a.m.