Analyzing multivariate count data with the Poisson log-normal model In PLNmodels: Poisson Lognormal Models



GLM-like interface

We provide a set of S3-methods for PLNfit that mimic the standard (G)LM-like interface of R::stats, which we present now.

One can access the fitted value of the counts (Abundance -- $\hat{\mathbf{Y}}$) and check that the algorithm basically learnt correctly from the data^[We use a log-log scale in our plot in order not to give an excessive importance to the higher counts in the fit]:

data.frame(
fitted   = as.vector(fitted(myPLN)),
observed = as.vector(trichoptera$Abundance) ) %>% ggplot(aes(x = observed, y = fitted)) + geom_point(size = .5, alpha =.25 ) + scale_x_log10() + scale_y_log10() + theme_bw() + annotation_logticks()  We can also reach the matrix of regression parameters$\mathbf{\Theta}$and the residual variance/covariance matrix$\boldsymbol{\Sigma}$of the latent variable with the traditional functions found in R for (G)LM manipulation: for the regression coefficents, we can use the coef (or coefficients) method. Approximated standard errors of the coefficients are also accessible via standard_error: data.frame( rbind(t(coef(myPLN)), t(standard_error(myPLN))), row.names = c("effect", "stderr") ) %>% select(1:5) %>% knitr::kable()  The residual covariance matrix better displays as an image matrix: corrplot(sigma(myPLN), is.corr = FALSE)  Observation weights It is also possible to use observation weights like in standard (G)LMs: myPLN_weighted <- PLN( Abundance ~ 1, data = trichoptera, weights = runif(nrow(trichoptera)), control = list(trace = 0) ) data.frame( unweighted = as.vector(fitted(myPLN)), weighted = as.vector(fitted(myPLN_weighted)) ) %>% ggplot(aes(x = unweighted, y = weighted)) + geom_point(size = .5, alpha =.25 ) + scale_x_log10() + scale_y_log10() + theme_bw() + annotation_logticks()  Accounting for covariates and offsets For ecological count data, it is generally a good advice to include the sampling effort via an offset term whenever available, otherwise samples are not necessarily comparable: myPLN_offsets <- PLN(Abundance ~ 1 + offset(log(Offset)), data = trichoptera, control = list(trace = 0))  Note that we use the function offset with a log-transform of the total counts since it acts in the latent layer of the model. Obviously the model with offsets is better since the log-likelihood is higher with the same number of parameters^[In PLNmodels the R-squared is a pseudo-R-squared that can only be trusted between model where the same offsets term was used]: rbind( myPLN$criteria,
myPLN_offsets$criteria ) %>% knitr::kable()  Let us try to correct for the wind effect in our model: myPLN_wind <- PLN(Abundance ~ 1 + Wind + offset(log(Offset)), data = trichoptera)  When we compare the models, the gain is clear in terms of log-likelihood. However, the BIC choses not to include this variable: rbind( myPLN_offsets$criteria,
myPLN_wind$criteria ) %>% knitr::kable()  Covariance models (full, diagonal, spherical) It is possible to change a bit the parametrization used for modelling the residual covariance matrix$\boldsymbol\Sigma$, and thus reduce the total number of parameters used in the model. By default, the residual covariance is fully parameterized (hence$p \times (p+1)/2$parameters). However, we can chose to only model the variances of the species and not the covariances, by means of a diagonal matrix$\boldsymbol\Sigma_D$with only$p$parameters. In an extreme situation, we may also chose a single variance parameter for the whole matrix$\boldsymbol\Sigma = \sigma \mathbf{I}_p$. This can be tuned in PLN with the control argument, a list controlling various aspects of the underlying optimization process: myPLN_diagonal <- PLN( Abundance ~ 1 + offset(log(Offset)), data = trichoptera, control = list(covariance = "diagonal", trace = 0) ) myPLN_spherical <- PLN( Abundance ~ 1 + offset(log(Offset)), data = trichoptera, control = list(covariance = "spherical", trace = 0) )  Note that, by default, the model chosen is \texttt{covariance = "spherical"}, so that the two following calls are equivalents: myPLN_default <- PLN(Abundance ~ 1, data = trichoptera, ) myPLN_full <- PLN(Abundance ~ 1, data = trichoptera, control = list(covariance = "full"))  Different covariance models can then be compared with the usual criteria: it seems that the gain brought by passing from a diagonal matrix to a fully parameterized covariance is not worth having so many additional parameters: rbind( myPLN_offsets$criteria,
myPLN_diagonal$criteria, myPLN_spherical$criteria
) %>%
as.data.frame(row.names = c("full", "diagonal", "spherical")) %>%
knitr::kable()


A final model that we can try is the diagonal one with the wind as a covariate, which gives a slight improvement.

myPLN_final <-
PLN(
Abundance ~ 1 + Wind + offset(log(Offset)),
data    = trichoptera, control = list(covariance = "diagonal", trace = 0)
)
rbind(
myPLN_wind$criteria, myPLN_diagonal$criteria,
myPLN_final\$criteria
) %>% knitr::kable()


Try the PLNmodels package in your browser

Any scripts or data that you put into this service are public.

PLNmodels documentation built on Jan. 28, 2020, 1:07 a.m.