R/pglmm.R
In phyr: Model Based Phylogenetic Analysis

Documented in pglmm

# doc ----
#' Phylogenetic Generalized Linear Mixed Model for Community Data
#'
#' This function performs Generalized Linear Mixed Models for binary, count, 
#' and continuous data, estimating regression coefficients with
#' approximate standard errors. It is specifically designed for community data
#' in which species occur within multiple sites (locations). 
#' A Bayesian version of PGLMM uses the package \code{INLA}, 
#' which is not available on CRAN yet. If you wish to use this option, 
#' you must first install \code{INLA} from \url{https://www.r-inla.org/} by running
#' \code{install.packages('INLA', repos='https://www.math.ntnu.no/inla/R/stable')} in R.
#'  
#'
#' For Gaussian data, \code{pglmm} analyzes the phylogenetic linear mixed model
#' 
#' \deqn{Y = \beta_0 + \beta_1x + b_0 + b_1x}{Y = beta_0 + beta_1x + b_0 + b_1x}
#' \deqn{b_0 ~ Gaussian(0, \sigma_0^2I_{sp})}{b_0 ~ Gaussian(0, sigma_0^2I_(sp))}
#' \deqn{b_1 ~ Gaussian(0, \sigma_0^2V_{sp})}{b_0 ~ Gaussian(0, sigma_0^2V_(sp))}
#' \deqn{\eta ~ Gaussian(0,\sigma^2)}{e ~ Gaussian(0,sigma^2)}
#' 
#' where \eqn{\beta_0}{beta_0} and \eqn{\beta_1}{beta_1} are fixed
#' effects, and \eqn{V_{sp}}{V_(sp)} is a variance-covariance matrix
#' derived from a phylogeny (typically under the assumption of
#' Brownian motion evolution). Here, the variation in the mean
#' (intercept) for each species is given by the random effect
#' \eqn{b_0}{b_0} that is assumed to be independent among
#' species. Variation in species' responses to predictor variable
#' \eqn{x}{x} is given by a random effect \eqn{b_0}{b_0} that is
#' assumed to depend on the phylogenetic relatedness among species
#' given by \eqn{V_{sp}}{V_(sp)}; if species are closely related,
#' their specific responses to \eqn{x}{x} will be similar. This
#' particular model would be specified as
#' 
#' \code{z <- pglmm(Y ~ X + (1|sp__), data = data, family = "gaussian", cov_ranef = list(sp = phy))}
#' 
#' Or you can prepare the random terms manually (not recommended for simple models but may be necessary for complex models):
#' 
#' \code{re.1 <- list(1, sp = dat$sp, covar = diag(nspp))}
#' 
#' \code{re.2 <- list(dat$X, sp = dat$sp, covar = Vsp)}
#' 
#' \code{z <- pglmm(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))}
#' 
#' The covariance matrix covar is standardized to have its determinant
#' equal to 1. This in effect standardizes the interpretation of the
#' scalar \eqn{\sigma^2}{sigma^2}. Although mathematically this is
#' not required, it is a very good idea to standardize the predictor
#' (independent) variables to have mean 0 and variance 1. This will
#' make the function more robust and improve the interpretation of the
#' regression coefficients. For categorical (factor) predictor
#' variables, you will need to construct 0-1 dummy variables, and
#' these should not be standardized (for obvious reasons).
#'
#' For binary generalized linear mixed models (\code{family =
#' 'binomial'}), the function estimates parameters for the model of
#' the form, for example,
#'
#' \deqn{y = \beta_0 + \beta_1x + b_0 + b_1x}{y = beta_0 + beta_1x + b_0 + b_1x}
#' \deqn{Y = logit^{-1}(y)}{Y = logit^(-1)(y)}
#' \deqn{b_0 ~ Gaussian(0, \sigma_0^2I_{sp})}{b_0 ~ Gaussian(0, sigma_0^2I_(sp))}
#' \deqn{b_1 ~ Gaussian(0, \sigma_0^2V_{sp})}{b_0 ~ Gaussian(0, sigma_0^2V_(sp))}
#'
#' where \eqn{\beta_0}{beta_0} and \eqn{\beta_1}{beta_1} are fixed
#' effects, and \eqn{V_{sp}}{V_(sp)} is a variance-covariance matrix
#' derived from a phylogeny (typically under the assumption of
#' Brownian motion evolution).
#' 
#' \code{z <- pglmm(Y ~ X + (1|sp__), data = data, family = "binomial", cov_ranef = list(sp = phy))}
#' 
#' As with the linear mixed model, it is a very good idea to
#' standardize the predictor (independent) variables to have mean 0
#' and variance 1. This will make the function more robust and improve
#' the interpretation of the regression coefficients.
#'  
#' @param formula A two-sided linear formula object describing the
#'   mixed effects of the model. 
#'   
#'   To specify that a random term should have phylogenetic covariance matrix along 
#'   with non-phylogenetic one, add \code{__} (two underscores) at the end of the group variable; 
#'   e.g., \code{+ (1 | sp__)} will construct two random terms, 
#'   one with phylogenetic covariance matrix and another with non-phylogenetic (identity) matrix. 
#'   In contrast, \code{__} in the nested terms (below) will only create a phylogenetic covariance matrix. 
#'   Nested random terms have the general form \code{(1|sp__@site__)} which represents 
#'   phylogenetically related species nested within correlated sites.
#'   This form can be used for bipartite questions. For example, species could be 
#'   phylogenetically related pollinators and sites could be phylogenetically related plants, leading to
#'   the random effect `(1|insects__@plants__)`. If more than one phylogeny is used, remember to add 
#'   all to the argument `cov_ranef = list(insects = insect_phylo, plants = plant_phylo)`. Phylogenetic correlations can
#'   be dropped by removing the \code{__} underscores. Thus, the form \code{(1|sp@site__)} excludes the phylogenetic
#'   correlations among species, while the form \code{(1|sp__@site)} excludes the correlations among sites.
#'   
#'   Note that correlated random terms are not allowed. For example,
#'   \code{(x|g)} will be the same as \code{(0 + x|g)} in the \code{lme4::lmer} syntax. However, 
#'   \code{(x1 + x2|g)} won't work, so instead use  \code{(x1|g) + (x2|g)}.
#' @param data A \code{\link{data.frame}} containing the variables named in formula. 
#' @param family Either "gaussian" for a Linear Mixed Model, or 
#'   "binomial" or "poisson" for Generalized Linear Mixed Models.
#'   "family" should be specified as a character string (i.e., quoted). For binomial and 
#'   Poisson data, we use the canonical logit and log link functions, respectively. 
#'   Binomial data can be either presence/absence, or a two-column array of 'successes' and 'failures'. 
#'   For both binomial  and Poisson data, we add an observation-level 
#'   random term by default via \code{add.obs.re = TRUE}. If \code{bayes = TRUE} there are
#'   two additional families available: "zeroinflated.binomial", and "zeroinflated.poisson",
#'   which add a zero inflation parameter; this parameter gives the probability that the response is
#'   a zero. The rest of the parameters of the model then reflect the "non-zero" part part
#'   of the model. Note that "zeroinflated.binomial" only makes sense for success/failure
#'   response data.
#' @param cov_ranef A named list of covariance matrices of random terms. The names should be the
#'   group variables that are used as random terms with specified covariance matrices 
#'   (without the two underscores, e.g. \code{list(sp = tree1, site = tree2)}). The actual object 
#'   can be either a phylogeny with class "phylo" or a prepared covariance matrix. If it is a phylogeny,
#'   `pglmm` will prune it and then convert it to a covariance matrix assuming Brownian motion evolution.
#'   `pglmm` will also standardize all covariance matrices to have determinant of one. Group variables
#'   will be converted to factors and all covariance matrices will be rearranged so that rows and
#'   columns are in the same order as the levels of their corresponding group variables.
#' @param random.effects Optional pre-build list of random effects. If \code{NULL} (the default), 
#'   the function \code{\link{prep_dat_pglmm}} will prepare the random effects for you from the information
#'   in \code{formula}, \code{data}, and \code{cov_ranef}. \code{random.effect} allows a list of
#'   pre-generated random effects terms to increase flexibility; for example, this makes it 
#'   possible to construct models with both phylogenetic correlation and spatio-temporal autocorrelation.
#'   In preparing \code{random.effect}, make sure that the orders of rows and columns of 
#'   covariance matrices in the list are the same as their corresponding group variables
#'   in the data. Also, this should be _a list of lists_, e.g. 
#'   `random.effects = list(re1 = list(matrix_a), re2 = list(1, sp = sp, covar = Vsp))`.
#' @param REML Whether REML or ML is used for model fitting the random effects. Ignored if
#'  \code{bayes = TRUE}.
#' @param optimizer nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex. 
#'   Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.
#'   Ignored if \code{bayes = TRUE}.
#' @param repulsion When there are nested random terms specified, \code{repulsion = FALSE} tests
#'   for phylogenetic underdispersion while \code{repulsion = FALSE} tests for overdispersion.
#'   This argument is a logical vector of length either 1 or >1.
#'   If its length is 1, then all covariance matrices in nested terms will be either 
#'   inverted (overdispersion) or not. If its length is >1, then you can select
#'   which covariance matrix in the nested terms to be inverted. Make sure to get 
#'   the length right: for all the terms with \code{@}, count the number of "__" 
#'   to determine the length of repulsion. For example, \code{sp__@site} and \code{sp@site__}
#'   will each require one element of \code{repulsion}, while \code{sp__@site__} will take two 
#'   elements (repulsion for sp and repulsion for site). Therefore, if your nested terms are 
#'   \code{(1|sp__@site) + (1|sp@site__) + (1|sp__@site__)}, then you should set the 
#'   repulsion to be something like \code{c(TRUE, FALSE, TRUE, TRUE)} (length of 4). 
#' @param add.obs.re Whether to add an observation-level random term for binomial or Poisson
#'   distributions. Normally it would be a good idea to add this to account for overdispersion,
#'   so \code{add.obs.re = TRUE} by default.
#' @param verbose If \code{TRUE}, the model deviance and running
#'   estimates of \code{s2} and \code{B} are plotted each iteration
#'   during optimization.
#' @param cpp Whether to use C++ function for optim. Default is TRUE. Ignored if \code{bayes = TRUE}.
#' @param bayes Whether to fit a Bayesian version of the PGLMM using \code{r-inla}.
#' @param s2.init An array of initial estimates of s2 for each random
#'   effect that scales the variance. If s2.init is not provided for
#'   \code{family="gaussian"}, these are estimated using \code{\link{lm}} assuming 
#'   no phylogenetic signal. A better approach might be to run \code{link[lme4:lmer]{lmer}} 
#'   and use the output random effects for \code{s2.init}. If \code{s2.init} is not
#'   provided for \code{family = "binomial"}, these are set to 0.25.
#' @param B.init Initial estimates of \eqn{B}{B}, a matrix containing
#'   regression coefficients in the model for the fixed effects. This
#'   matrix must have \code{dim(B.init) = c(p + 1, 1)}, where \code{p} is the
#'   number of predictor (independent) variables; the first element of
#'   \code{B} corresponds to the intercept, and the remaining elements
#'   correspond in order to the predictor (independent) variables in the
#'   formula. If \code{B.init} is not provided, these are estimated
#'   using \code{\link{lm}} or \code{\link{glm}} assuming no phylogenetic signal.
#'   A better approach might be to run \code{\link[lme4:lmer]{lmer}} and use the 
#'   output fixed effects for \code{B.init}. When \code{bayes = TRUE}, initial values are estimated
#'   using the maximum likelihood fit unless \code{ML.init = FALSE}, in
#'   which case the default \code{INLA} initial values will be used.
#' @param reltol A control parameter dictating the relative tolerance
#'   for convergence in the optimization; see \code{\link{optim}}.
#' @param maxit A control parameter dictating the maximum number of
#'   iterations in the optimization; see \code{\link{optim}}.
#' @param tol.pql A control parameter dictating the tolerance for
#'   convergence in the PQL estimates of the mean components of the
#'   GLMM. Ignored if \code{family = "gaussian"} or \code{bayes = TRUE}.
#' @param maxit.pql A control parameter dictating the maximum number
#'   of iterations in the PQL estimates of the mean components of the
#'   GLMM. Ignored if \code{family = "gaussian"} or \code{bayes = TRUE}.
#' @param marginal.summ Summary statistic to use for the estimate of coefficients when
#'   doing a Bayesian PGLMM (when \code{bayes = TRUE}). Options are: "mean",
#'   "median", or "mode", referring to different characterizations of the central
#'   tendency of the Bayesian posterior marginal distributions. Ignored if \code{bayes = FALSE}.
#' @param calc.DIC Should the Deviance Information Criterion be calculated and returned
#'   when doing a Bayesian PGLMM? Ignored if \code{bayes = FALSE}.
#' @param calc.WAIC Should the WAIC be calculated and returned
#'   when doing a Bayesian PGLMM? Ignored if \code{bayes = FALSE}.
#' @param prior Which type of default prior should be used by \code{pglmm}?
#'   Only used if \code{bayes = TRUE}. There are currently four options:
#'   "inla.default", which uses the default \code{INLA} priors; "pc.prior.auto", which uses a
#'   complexity penalizing prior (as described in 
#'   \href{https://arxiv.org/abs/1403.4630v3}{Simpson et al. (2017)}) designed to automatically 
#'   choose good parameters (only available for gaussian and binomial responses); "pc.prior", which 
#'   allows the user to set custom parameters on the "pc.prior" prior, using the \code{prior_alpha} 
#'   and \code{prior_mu} parameters (Run \code{INLA::inla.doc("pc.prec")} for details on these 
#'   parameters); and "uninformative", which sets a very uninformative prior 
#'   (nearly uniform) by using a very flat exponential distribution. The last option is generally
#'   not recommended but may in some cases give estimates closer to the maximum likelihood estimates.
#'   "pc.prior.auto" is only implemented for \code{family = "gaussian"} and \code{family = "binomial"} 
#'   currently.
#' @param prior_alpha Only used if \code{bayes = TRUE} and \code{prior = "pc.prior"}, in
#'   which case it sets the alpha parameter of \code{INLA}'s complexity penalizing prior for the 
#'   random effects. The prior is an exponential distribution where prob(sd > mu) = alpha, 
#'   where sd is the standard deviation of the random effect.
#' @param prior_mu Only used if \code{bayes = TRUE} and \code{prior = "pc.prior"}, in
#'   which case it sets the mu parameter of \code{INLA}'s complexity penalizing prior for the 
#'   random effects. The prior is an exponential distribution where prob(sd > mu) = alpha, 
#'   where sd is the standard deviation of the random effect.
#' @param ML.init Only relevant if \code{bayes = TRUE}. Should maximum
#'   likelihood estimates be calculated and used as initial values for
#'   the Bayesian model fit? Sometimes this can be helpful, but it may not help; thus,
#'   we set the default to \code{FALSE}. Also, it
#'   does not work with the zero-inflated families.
#' @param tree A phylogeny for column sp, with "phylo" class, or a covariance matrix for sp. 
#'   Make sure to have all species in the matrix; if the matrix is not standardized, 
#'   (i.e., det(tree) != 1), `pglmm` will try to standardize it for you. 
#'   No longer used: keep here for compatibility.
#' @param tree_site A second phylogeny for "site". This is required only if the 
#'   site column contains species instead of sites. This can be used for bipartitie 
#'   questions; tree_site can also be a covariance matrix. Make sure to have all sites 
#'   in the matrix; if the matrix is not standardized (i.e., det(tree_site) != 1), 
#'   pglmm` will try to standardize it for you. No longer used: keep here for compatibility.
#' @param sp No longer used: keep here for compatibility.
#' @param site No longer used: keep here for compatibility.
#' @param bayes_options Additional options to pass to INLA for if \code{bayes = TRUE}. A named list where the names
#' correspond to parameters in the \code{inla} function. One special option is \code{diagonal}: if an element in
#' the options list is names \code{diagonal} this tells \code{INLA} to add its value to the diagonal of the random effects
#' precision matrices. This can help with numerical stability if the model is ill-conditioned (if you get a lot of warnings,
#' try setting this to \code{list(diagonal = 1e-4)}).
#' @param bayes_nested_matrix_as_list For `bayes = TRUE`, prepare the nested terms as a list of length of 4 as the old way?
#' @return An object (list) of class \code{communityPGLMM} with the following elements:
#' \item{formula}{the formula for fixed effects}
#' \item{formula_original}{the formula for both fixed effects and random effects}
#' \item{data}{the dataset}
#' \item{family}{\code{gaussian}, \code{binomial}, or \code{poisson} depending on the model fit}
#' \item{random.effects}{the list of random effects}
#' \item{B}{estimates of the regression coefficients}
#' \item{B.se}{approximate standard errors of the fixed effects regression coefficients. 
#'   This is set to NULL if \code{bayes = TRUE}.}
#' \item{B.ci}{approximate Bayesian credible interval of the fixed effects regression coefficients.
#'   This is set to NULL if \code{bayes = FALSE}}
#' \item{B.cov}{approximate covariance matrix for the fixed effects regression coefficients}
#' \item{B.zscore}{approximate Z scores for the fixed effects regression coefficients. This is set to NULL if \code{bayes = TRUE}}
#' \item{B.pvalue}{approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if \code{bayes = TRUE}}
#' \item{ss}{standard deviations of the random effects for the covariance matrix \eqn{\sigma^2V}{sigma^2 V} for each random effect in order. For the linear mixed model, the residual variance is listed last.}
#' \item{s2r}{random effects variances for non-nested random effects}
#' \item{s2n}{random effects variances for nested random effects}
#' \item{s2resid}{for linear mixed models, the residual variance}
#' \item{s2r.ci}{Bayesian credible interval for random effects variances for non-nested random effects.
#' This is set to NULL if \code{bayes = FALSE}}
#' \item{s2n.ci}{Bayesian credible interval for random effects variances for nested random effects.
#' This is set to NULL if \code{bayes = FALSE}}
#' \item{s2resid.ci}{Bayesian credible interval for linear mixed models, the residual variance.
#' This is set to NULL if \code{bayes = FALSE}}
#' \item{logLik}{for linear mixed models, the log-likelihood for either the restricted likelihood (\code{REML=TRUE}) or the overall likelihood (\code{REML=FALSE}). This is set to NULL for generalized linear mixed models. If \code{bayes = TRUE}, this is the marginal log-likelihood}
#' \item{AIC}{for linear mixed models, the AIC for either the restricted likelihood (\code{REML = TRUE}) or the overall likelihood (\code{REML = FALSE}). This is set to NULL for generalised linear mixed models}
#' \item{BIC}{for linear mixed models, the BIC for either the restricted likelihood (\code{REML = TRUE}) or the overall likelihood (\code{REML = FALSE}). This is set to NULL for generalised linear mixed models}
#' \item{DIC}{for Bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if \code{bayes = FALSE}.}
#' \item{REML}{whether or not REML is used (\code{TRUE} or \code{FALSE}).}
#' \item{bayes}{whether or not a Bayesian model was fit.}
#' \item{marginal.summ}{The specified summary statistic used to summarize the Bayesian marginal distributions.
#' Only present if \code{bayes = TRUE}}
#' \item{s2.init}{the user-provided initial estimates of \code{s2}}
#' \item{B.init}{the user-provided initial estimates of \code{B}}
#' \item{Y}{the response (dependent) variable returned in matrix form}
#' \item{X}{the predictor (independent) variables returned in matrix form (including 1s in the first column)}
#' \item{H}{the residuals. For linear mixed models, this does not account for random terms, 
#    i.e. it is similar to \code{Y - predict(merMod, re.form = NA)} for models fitted with lme4. 
#'   To get residuals after accounting for both fixed and random terms, use \code{residuals()}.
#'   For the generalized linear mixed model, these are the predicted residuals in the 
#'   logit -1 space.}
#' \item{iV}{the inverse of the covariance matrix for the entire system (of dimension (`nsp` * `nsite`) 
#'   by (`nsp` * `nsite`)). This is NULL if \code{bayes = TRUE}.}
#' \item{mu}{predicted mean values for the generalized linear mixed model (i.e., similar to \code{fitted(merMod)}). 
#'   Set to NULL for linear mixed models, for which we can use [fitted()].}
#' \item{nested}{matrices used to construct the nested design matrix. This is set to NULL if \code{bayes = TRUE}}
#' \item{Zt}{the design matrix for random effects. This is set to NULL if \code{bayes = TRUE}}
#' \item{St}{diagonal matrix that maps the random effects variances onto the design matrix}
#' \item{convcode}{the convergence code provided by \code{\link{optim}}. This is set to NULL if \code{bayes = TRUE}}
#' \item{niter}{number of iterations performed by \code{\link{optim}}. This is set to NULL if \code{bayes = TRUE}}
#' \item{inla.model}{Model object fit by underlying \code{inla} function. Only returned
#' if \code{bayes = TRUE}}
#' @author Anthony R. Ives, Daijiang Li, Russell Dinnage
#' @references Ives, A. R. and M. R. Helmus. 2011. Generalized linear
#' mixed models for phylogenetic analyses of community
#' structure. Ecological Monographs 81:511-525.
#' @references Ives A. R. 2018. Mixed and phylogenetic models: a conceptual introduction to correlated data.
#' https://leanpub.com/correlateddata.
#' @references Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic
#' trait-based analyses of ecological networks. Ecology 94:2321-2333.
#' @references Simpson, Daniel, et al. 2017. Penalising model component complexity: 
#' A principled, practical approach to constructing priors. 
#' Statistical science 32(1): 1-28.
#' @references Li, D., Ives, A. R., & Waller, D. M. 2017. 
#' Can functional traits account for phylogenetic signal in community composition? 
#' New Phytologist, 214(2), 607-618.
#' @rdname pglmm
#' @export
#' @examples
#' ## Structure of examples:
#' # First, a (brief) description of model types, and how they are specified
#' # - these are *not* to be run 'as-is'; they show how models should be organised
#' # Second, a run-through of how to simulate, and then analyse, data
#' # - these *are* to be run 'as-is'; they show how to format and work with data
#'
#' \donttest{
#' #############################################
#' ### Brief summary of models and their use ###
#' #############################################
#' ## Model structures from Ives & Helmus (2011)
#' if(FALSE){
#'   # dat = data set for regression (note: must have a column "sp" and a column "site")
#'   # phy = phylogeny of class "phylo"
#'   # repulsion = to test phylogenetic repulsion or not
#'   
#'   # Model 1 (Eq. 1)
#'   z <- pglmm(freq ~ sp + (1|site) + (1|sp__@site), data = dat, family = "binomial",
#'              cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1)
#'   
#'   # Model 2 (Eq. 2)
#'   z <- pglmm(freq ~ sp + X + (1|site) + (X|sp__), data = dat, family = "binomial",
#'              cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1)
#'   
#'   # Model 3 (Eq. 3)
#'   z <- pglmm(freq ~ sp*X + (1|site) + (1|sp__@site), data = dat, family = "binomial",
#'              cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init = .1)
#'   
#'   ## Model structure from Rafferty & Ives (2013) (Eq. 3)
#'   # dat = data set
#'   # phyPol = phylogeny for pollinators (pol)
#'   # phyPlt = phylogeny for plants (plt)
#'   
#'   z <- pglmm(freq ~ pol * X + (1|pol__) + (1|plt__) + (1|pol__@plt) +
#'                (1|pol@plt__) + (1|pol__@plt__),
#'              data = dat, family = "binomial",
#'              cov_ranef = list(pol = phyPol, plt = phyPlt),
#'              REML = TRUE, verbose = TRUE, s2.init = .1)
#' }
#' 
#' #####################################################
#' ### Detailed analysis showing covariance matrices ###
#' #####################################################
#' 
#' # This is the example from section 4.3 in Ives, A. R. (2018) Mixed 
#' # and phylogenetic models: a conceptual introduction to correlated data.
#' 
#' library(ape)
#' library(mvtnorm)
#' 
#' # Investigating covariance matrices for different types of model structure
#' nspp <- 6
#' nsite <- 4
#' 
#' # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3)
#' phy <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3)
#' 
#' # Simulate species means
#' sd.sp <- 1
#' mean.sp <- rTraitCont(phy, model = "BM", sigma=sd.sp^2)
#' 
#' # Replicate values of mean.sp over sites
#' Y.sp <- rep(mean.sp, times=nsite)
#' 
#' # Simulate site means
#' sd.site <- 1
#' mean.site <- rnorm(nsite, sd=sd.site)
#' 
#' # Replicate values of mean.site over sp
#' Y.site <- rep(mean.site, each=nspp)
#' 
#' # Compute a covariance matrix for phylogenetic attraction
#' sd.attract <- 1
#' Vphy <- vcv(phy)
#' 
#' # Standardize the phylogenetic covariance matrix to have determinant = 1. 
#' # (For an explanation of this standardization, see subsection 4.3.1 in Ives (2018))
#' Vphy <- Vphy/(det(Vphy)^(1/nspp))
#' 
#' # Construct the overall covariance matrix for phylogenetic attraction. 
#' # (For an explanation of Kronecker products, see subsection 4.3.1 in the book)
#' V <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy)
#' Y.attract <- array(t(rmvnorm(n = 1, sigma = sd.attract^2*V)))
#' 
#' # Simulate residual errors
#' sd.e <- 1
#' Y.e <- rnorm(nspp*nsite, sd = sd.e)
#' 
#' # Construct the dataset
#' d <- data.frame(sp = rep(phy$tip.label, times = nsite), 
#'                 site = rep(1:nsite, each = nspp))
#' 
#' # Simulate abundance data
#' d$Y <- Y.sp + Y.site + Y.attract + Y.e
#' 
#' # Analyze the model
#' pglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, cov_ranef = list(sp = phy))
#' 
#' # Display random effects: the function `pglmm_plot_ranef()` does what 
#' # the name implies. You can set `show.image = TRUE` and `show.sim.image = TRUE` 
#' # to see the matrices and simulations.
#' re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site), data = d, 
#'                     cov_ranef = list(sp = phy), show.image = FALSE, 
#'                     show.sim.image = FALSE)
#' 
#' #################################################
#' ### Example of a bipartite phylogenetic model ###
#' #################################################
#' 
#' # Investigating covariance matrices for different types of model structure
#' nspp <- 20
#' nsite <- 15
#' 
#' # Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3)
#' phy.sp <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3)
#' phy.site <- compute.brlen(rtree(n = nsite), method = "Grafen", power = 1.3)
#' 
#' # Simulate species means
#' mean.sp <- rTraitCont(phy.sp, model = "BM", sigma = 1)
#' 
#' # Replicate values of mean.sp over sites
#' Y.sp <- rep(mean.sp, times = nsite)
#' 
#' # Simulate site means
#' mean.site <- rTraitCont(phy.site, model = "BM", sigma = 1)
#' 
#' # Replicate values of mean.site over sp
#' Y.site <- rep(mean.site, each = nspp)
#' 
#' # Generate covariance matrix for phylogenetic attraction among species
#' sd.sp.attract <- 1
#' Vphy.sp <- vcv(phy.sp)
#' Vphy.sp <- Vphy.sp/(det(Vphy.sp)^(1/nspp))
#' V.sp <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy.sp)
#' Y.sp.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.attract^2*V.sp)))

#' # Generate covariance matrix for phylogenetic attraction among sites
#' sd.site.attract <- 1
#' Vphy.site <- vcv(phy.site)
#' Vphy.site <- Vphy.site/(det(Vphy.site)^(1/nsite))
#' V.site <- kronecker(Vphy.site, diag(nrow = nspp, ncol = nspp))
#' Y.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.site.attract^2*V.site)))
#' 
#' # Generate covariance matrix for phylogenetic attraction of species:site interaction
#' sd.sp.site.attract <- 1
#' V.sp.site <- kronecker(Vphy.site, Vphy.sp)
#' Y.sp.site.attract <- array(t(rmvnorm(n = 1, sigma = sd.sp.site.attract^2*V.sp.site)))
#' 
#' # Simulate residual error
#' sd.e <- 0.5
#' Y.e <- rnorm(nspp*nsite, sd = sd.e)
#' 
#' # Construct the dataset
#' d <- data.frame(sp = rep(phy.sp$tip.label, times = nsite), 
#'                 site = rep(phy.site$tip.label, each = nspp))
#' 
#' # Simulate abundance data
#' d$Y <- Y.sp + Y.site + Y.sp.attract + Y.site.attract + Y.sp.site.attract + Y.e
#' 
#' # Plot random effects covariance matrices and then add phylogenies
#' # Note that, if show.image and show.sim are not specified, pglmm_plot_ranef() shows
#' # the covariance matrices if nspp * nsite < 200 and shows simulations 
#' # if nspp * nsite > 100
#' re <- pglmm_plot_ranef(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + 
#'                     (1|sp@site__) + (1|sp__@site__),
#'                     data=d, cov_ranef = list(sp = phy.sp, site = phy.site))
#' 
#' # This flips the phylogeny to match to covariance matrices
#' rot.phy.site <- phy.site
#' for(i in (nsite+1):(nsite+Nnode(phy.site))) 
#'    rot.phy.site <- rotate(rot.phy.site, node = i)
#' 
#' plot(phy.sp, main = "Species", direction = "upward")
#' plot(rot.phy.site, main = "Site")
#' 
#' # Analyze the simulated data and compute a P-value for the (1|sp__@site__) 
#' # random effect using a LRT. It is often better to fit the reduced model before
#' # the full model, because it s numerically easier to fit the reduced model, 
#' # and then the parameter estimates from the reduced model can be given to the
#' # full model. In this case, I have used the estimates of the random effects 
#' # from the reduce model, mod.r$ss, as the initial estimates for the same 
#' # parameters in the full model in the statement s2.init=c(mod.r$ss, 0.01)^2. 
#' # The final 0.01 is for the last random effect in the full model, (1|sp__@site__). 
#' # Note also that the output of the random effects from communityPGLMM(), mod.r$ss, 
#' # are the standard deviations, so they have to be squared for use as initial 
#' # values of variances in mod.f.
#' 
#' mod.r <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__), 
#'                         data = d, cov_ranef = list(sp = phy.sp, site = phy.site))
#' mod.f <- pglmm(Y ~ 1 + (1|sp__) + (1|site__) + (1|sp__@site) + (1|sp@site__) + 
#'                (1|sp__@site__), data = d, 
#'                cov_ranef = list(sp = phy.sp, site = phy.site), 
#'                s2.init = c(mod.r$ss, 0.01)^2)
#' mod.f
#' pvalue <- pchisq(2*(mod.f$logLik - mod.r$logLik), df = 1, lower.tail = FALSE)
#' pvalue
#' } 

pglmm <- function(formula, data = NULL, family = "gaussian", cov_ranef = NULL,
                           random.effects = NULL, REML = TRUE, 
                           optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"),
                           repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE, 
                           cpp = TRUE, bayes = FALSE, 
                           s2.init = NULL, B.init = NULL, reltol = 10^-6, 
                           maxit = 500, tol.pql = 10^-6, maxit.pql = 200,  
                           marginal.summ = "mean", calc.DIC = TRUE, calc.WAIC = TRUE, prior = "inla.default", 
                           prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE,
                           tree = NULL, tree_site = NULL, sp = NULL, site = NULL, bayes_options = NULL,
                  bayes_nested_matrix_as_list = FALSE
                           ) {

  optimizer = match.arg(optimizer)
  
  if ((family %nin% c("gaussian", "binomial", "poisson")) & (bayes == FALSE)){
    stop("\nSorry, but only binomial, poisson and gaussian options are available for
         pglmm at this time")
  }
  
  if(bayes) {
    if (!requireNamespace("INLA", quietly = TRUE)) {
      stop("To run pglmm with bayes = TRUE, you need to install the packages 'INLA'. \ 
           Please run in your R terminal:\
           install.packages('INLA', repos='https://inla.r-inla-download.org/R/stable')")
    }
    if ((family %nin% c("gaussian", "binomial", "poisson", "zeroinflated.binomial", "zeroinflated.poisson"))){
      stop("\nSorry, but only binomial (binary or success/failure), poisson (count), and gaussian options 
           are available for Bayesian pglmm at this time")
    }
  }
  
  if(family %in% c("binomial", "poisson") & !is.null(tree)){
    if(("phylo" %in% class(tree)) & !ape::is.ultrametric(tree)){
      warning("The tree is not ultrametric, which will likely give misleading results for family=binomial and poisson models.")
    }
  }
  
  data = as.data.frame(data) # in case of tibbles
  fm_original = formula
  prep_re = if(is.null(random.effects)) TRUE else FALSE
  if(prep_re) {
    # to make old code work ...
    if(is.null(cov_ranef) & any(grepl("__", all.vars(formula)))){
      if(!is.null(tree) | !is.null(tree_site))
        warning("arguments `tree` and `tree_site` are deprecated; please use `cov_ranef` instead.", 
                call. = FALSE)
      if(!is.null(tree) & is.null(tree_site)) cov_ranef = list(sp = tree) # column name must be sp
      if(is.null(tree) & !is.null(tree_site)) cov_ranef = list(site = tree_site) # column name must be site
      if(!is.null(tree) & !is.null(tree_site)) cov_ranef = list(sp = tree, site = tree_site)
    }
    dat_prepared = prep_dat_pglmm(formula, data, cov_ranef, repulsion, prep_re, family, add.obs.re, bayes, bayes_nested_matrix_as_list)
    formula = dat_prepared$formula
    random.effects = dat_prepared$random.effects
    cov_ranef_updated = dat_prepared$cov_ranef_updated
  } else {
    formula = lme4::nobars(formula)
    for(i in 1:length(random.effects)){
      if(length(random.effects[[i]]) >= 3){
        if(inherits(random.effects[[i]][[3]], c("matrix", "Matrix")) &
           !is.null(rownames(random.effects[[i]][[3]]))){
          if(!all(rownames(random.effects[[i]][[3]]) == colnames(random.effects[[i]][[3]])))
            stop("the row and column names of cov matrix in random.effects[", i, "] not in the same order")
          if(!all(rownames(random.effects[[i]][[3]]) == levels(random.effects[[i]][[2]]))){
            warning("the row/column names of cov matrix in random.effects[", i, 
                    "] not in the same order of its grouping variable, reordering now", 
                    immediate. = TRUE, call. = FALSE)
            if(length(setdiff(levels(random.effects[[i]][[2]]), rownames(random.effects[[i]][[3]]))))
              stop("some levels in the grouping variable of random.effects[", i, 
                   "] not in the cov matrix")
            random.effects[[i]][[3]] = 
              random.effects[[i]][[3]][levels(random.effects[[i]][[2]]), levels(random.effects[[i]][[2]])]
          }
        }
      }
    }
  }
  
  # initial values for bayesian analysis: binomial and gaussian
  if(bayes & ML.init & (family %in% c("binomial", "gaussian", "poisson"))) {
    if (family == "gaussian") {
      ML.init.z <- try(communityPGLMM.gaussian(formula = formula, data = data, 
                                           sp = sp, site = site, 
                                           random.effects = random.effects, REML = REML, 
                                           s2.init = s2.init, B.init = B.init, 
                                           reltol = reltol, maxit = maxit, 
                                           verbose = verbose, cpp = cpp, optimizer = optimizer))
      if(!inherits(ML.init.z, "try-error")){
        s2.init <- c(ML.init.z$s2r, ML.init.z$s2n, ML.init.z$s2resid)
        B.init <- ML.init.z$B[ , 1, drop = TRUE]
      } else {
        warning("Initial model fitting with maximum likelihood approach failed.", immediate. = TRUE)
      }
    }
    
    if (family %in% c("binomial", "poisson")) {# this may take too long if dataset is large...
      if (is.null(s2.init)) s2.init <- 0.25
 
      ML.init.z <- try(communityPGLMM.glmm(formula = formula, data = data, 
                                         sp = sp, site = site, family = family,
                                         random.effects = random.effects, REML = REML, 
                                         s2.init = s2.init, B.init = B.init, reltol = reltol, 
                                         maxit = maxit, tol.pql = tol.pql, maxit.pql = maxit.pql, 
                                         verbose = verbose, cpp = cpp, optimizer = optimizer))
      if(!inherits(ML.init.z, "try-error")){
        s2.init <- c(ML.init.z$s2r, ML.init.z$s2n)
        B.init <- ML.init.z$B[ , 1, drop = TRUE]
      } else {
        warning("Initial model fitting with maximum likelihood approach failed.", immediate. = TRUE)
      }
    }
  } 
  
  if(bayes & ML.init & (family %nin% c("binomial", "gaussian", "poisson"))) {
    warning('ML.init option is only available for binomial, poisson and gaussian families. You will have to 
            specify initial values manually if you think the default are problematic.')
  }
  
  if(bayes) {
    z <- communityPGLMM.bayes(formula = formula, data = data, family = family,
                              sp = sp, site = site, 
                              random.effects = random.effects, 
                              s2.init = s2.init, B.init = B.init, 
                              verbose = verbose, 
                              marginal.summ = marginal.summ, calc.DIC = calc.DIC, calc.WAIC = calc.WAIC, 
                              prior = prior, 
                              prior_alpha = prior_alpha, 
                              prior_mu = prior_mu,
                              bayes_options = bayes_options)
  } else {# max likelihood 
    if (family == "gaussian") {
      z <- communityPGLMM.gaussian(formula = formula, data = data, 
                                   sp = sp, site = site, 
                                   random.effects = random.effects, REML = REML, 
                                   s2.init = s2.init, B.init = B.init, 
                                   reltol = reltol, maxit = maxit, 
                                   verbose = verbose, cpp = cpp, optimizer = optimizer)
    }
    
    if (family %in% c("binomial", "poisson")) {
      if (is.null(s2.init)) s2.init <- 0.25
      z <- communityPGLMM.glmm(formula = formula, data = data, family = family,
                               sp = sp, site = site, 
                               random.effects = random.effects, REML = REML, 
                               s2.init = s2.init, B.init = B.init, reltol = reltol, 
                               maxit = maxit, tol.pql = tol.pql, maxit.pql = maxit.pql, 
                               verbose = verbose, cpp = cpp, optimizer = optimizer)
    }
  }
  
  z$formula_original = fm_original
  z$cov_ranef = if(is.null(cov_ranef)) NA else cov_ranef_updated
  
  # # add names for ss
  # if(!is.null(names(random.effects))){
  #   re.names = names(random.effects)[c(
  #     which(sapply(random.effects, length) %nin% c(1, 2, 4)), # non-nested terms
  #     which(sapply(random.effects, length) %in% c(1, 2, 4)) # nested terms
  #   )]
  #   if(family == "gaussian") re.names <- c(re.names, "residual")
  #   if((!bayes) & family != "gaussian") names(z$ss) = re.names
  # }
  
  return(z)
}

communityPGLMM.gaussian <- function(formula, data = list(), family = "gaussian", 
                                    sp = NULL, site = NULL, random.effects = list(), 
                                    REML = TRUE, s2.init = NULL, B.init = NULL, 
                                    reltol = 10^-8, maxit = 500, verbose = FALSE, 
                                    cpp = TRUE, optimizer = "bobyqa") {
  
  dm = get_design_matrix(formula, data, random.effects, na.action = NULL)
  X = dm$X; Y = dm$Y; St = dm$St; Zt = dm$Zt; nested = dm$nested
  p <- ncol(X)
  n <- nrow(X)
  q <- length(random.effects)
  
  # Compute initial estimates assuming no phylogeny if not provided
  if (!is.null(B.init) & length(B.init) != p) {
    warning("B.init not correct length, so computed B.init using glm()")
  }
  if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p)) & !is.null(s2.init)) {
    B.init <- t(matrix(lm(formula = formula, data = data)$coefficients, ncol = p))
  }
  if (!is.null(B.init) & is.null(s2.init)) {
    s2.init <- var(lm(formula = formula, data = data)$residuals)/q
  }
  if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p)) & is.null(s2.init)) {
    B.init <- t(matrix(lm(formula = formula, data = data)$coefficients, ncol = p))
    s2.init <- var(lm(formula = formula, data = data)$residuals)/q
  }
  B <- B.init
  s <- as.vector(array(s2.init^0.5, dim = c(1, q)))
  
  if(cpp){
    if(is.null(St)) St = as(matrix(0, 0, 0), "dgTMatrix")
    if(is.null(Zt)) Zt = as(matrix(0, 0, 0), "dgTMatrix")
    out_res = pglmm_gaussian_internal_cpp(par = s, X, Y, Zt, St, nested, REML, 
                                          verbose, optimizer, maxit, 
                                          reltol, q, n, p, pi)
    logLik = out_res$logLik
    out = out_res$out
    row.names(out$B) = colnames(X)
    out$s2r = as.vector(out$s2r)
    convcode = out_res$convcode
    niter = out_res$niter[,1]
  } else {# R version
    if(optimizer == "Nelder-Mead"){
      if (q > 1) {
        opt <- optim(fn = pglmm_gaussian_LL_calc, par = s, X = X, Y = Y, Zt = Zt, St = St, 
                     nested = nested, REML = REML, verbose = verbose, 
                     method = "Nelder-Mead", control = list(maxit = maxit, reltol = reltol))
      } else {
        opt <- optim(fn = pglmm_gaussian_LL_calc, par = s, X = X, Y = Y, Zt = Zt, St = St, 
                     nested = nested, REML = REML, verbose = verbose,
                     method = "L-BFGS-B", control = list(maxit = maxit))
      }
    } else {
      # opts for nloptr
      if (optimizer == "bobyqa") nlopt_algor = "NLOPT_LN_BOBYQA"
      if (optimizer == "nelder-mead-nlopt") nlopt_algor = "NLOPT_LN_NELDERMEAD"
      if (optimizer == "subplex") nlopt_algor = "NLOPT_LN_SBPLX"
      opts <- list("algorithm" = nlopt_algor, "ftol_rel" = reltol, "ftol_abs" = reltol,
                   "xtol_rel" = 0.0001, "maxeval" = maxit)
      S0 <- nloptr::nloptr(x0 = s, eval_f = pglmm_gaussian_LL_calc, opts = opts, 
                           X = X, Y = Y, Zt = Zt, St = St, nested = nested, 
                           REML = REML, verbose = verbose, optim_ll = TRUE)
      opt = list(par = S0$solution, value = S0$objective, counts = S0$iterations,
                 convergence = S0$status, message = S0$message)
    }
    
    convcode = opt$convergence
    niter = opt$counts
    par_opt <- abs(Re(opt$par))
    LL <- opt$value
    out = pglmm_gaussian_LL_calc(par_opt, X, Y, Zt, St, nested, REML, verbose, optim_ll = FALSE)
    out$B.cov = as.matrix(out$B.cov)
    
    if (REML == TRUE) {
      logLik <- as.numeric(-0.5 * (n - p) * log(2 * pi) + 0.5 * determinant(t(X) %*% X)$modulus[1] - LL)
    } else {
      logLik <- as.numeric(-0.5 * n * log(2 * pi) - LL)
    }
  }
  
  # add names to variance estimates to make sure they are in order
  if(!is.null(names(random.effects))){
    re.len <- sapply(random.effects, length)
    if(length(out$sr)) names(out$sr) <- names(out$s2r) <- names(random.effects)[re.len == 3]
    if(length(out$sn)) names(out$sn) <- names(out$s2n) <- names(random.effects)[re.len %in% c(1, 4)]
    names(out$s2resid) <- "residual"
  }
 
  ss <- c(out$sr, out$sn, out$s2resid^0.5)
  B.zscore <- out$B/out$B.se
  B.pvalue <- 2 * pnorm(abs(B.zscore), lower.tail = FALSE)
  k <- p + q + 1
  AIC <- -2 * logLik + 2 * k
  BIC <- -2 * logLik + k * (log(n) - log(pi))
  
  results <- list(formula = formula, data = data, family = family, random.effects = random.effects, 
                  B = out$B, B.se = out$B.se, B.cov = out$B.cov, B.zscore = B.zscore, 
                  B.pvalue = B.pvalue, ss = ss, s2n = out$s2n, s2r = out$s2r,
                  s2resid = out$s2resid, logLik = logLik, AIC = AIC, BIC = BIC, 
                  REML = REML, bayes = FALSE, s2.init = s2.init, B.init = B.init, Y = Y, X = X, H = out$H, 
                  iV = as.matrix(out$iV), mu = NULL, nested = nested, Zt = Zt, St = St, 
                  convcode = convcode, niter = niter)
  class(results) <- c("communityPGLMM", "pglmm")
  results
}

communityPGLMM.glmm <- function(formula, data = list(), family = "binomial", 
                                sp = NULL, site = NULL, random.effects = list(), 
                                REML = TRUE, s2.init = 0.05, B.init = NULL, 
                                reltol = 10^-5, maxit = 40, tol.pql = 10^-6, 
                                maxit.pql = 200, verbose = FALSE, cpp = TRUE,
                                optimizer = "bobyqa") {
  
  dm = get_design_matrix(formula, data, random.effects, na.action = NULL)
  X = dm$X; Y = dm$Y; size = dm$size; St = dm$St; Zt = dm$Zt; nested = dm$nested
  p <- ncol(X)
  n <- nrow(X)
  q <- length(random.effects)

    # Compute initial estimates assuming no phylogeny if not provided
  if (!is.null(B.init) & length(B.init) != p) {
    warning("B.init not correct length, so computed B.init using glm()")
  }
  if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p))) {
    B.init <- t(matrix(glm(formula = formula, data = data, family = family, na.action = na.omit)$coefficients, ncol = p))
  } else {
    B.init <- matrix(B.init, ncol = 1)
  }
  ss <- as.vector(array(s2.init^0.5, dim = c(1, q)))
  
  if(cpp){
    if(is.null(St)) St = as(matrix(0, 0, 0), "dgTMatrix")
    if(is.null(Zt)) Zt = as(matrix(0, 0, 0), "dgTMatrix")
    internal_res = pglmm_internal_cpp(X = X, Y = Y, Zt = Zt, St = St, 
                                      nested = nested, REML = REML, verbose = verbose, 
                                      n = n, p = p, q = q, maxit = maxit, 
                                      reltol = reltol, tol_pql = tol.pql, 
                                      maxit_pql = maxit.pql, optimizer = optimizer, 
                                      B_init = B.init, ss = ss,
                                      family = family, totalSize = size)
    B = internal_res$B
    row.names(B) = colnames(X)
    ss = internal_res$ss[,1]
    iV = as(internal_res$iV, "dgCMatrix")
    mu = internal_res$mu
    row.names(mu) = 1:nrow(mu)
    H = internal_res$H
    convcode = internal_res$convcode
    niter = internal_res$niter[, 1]
    LL = internal_res$LL
  } else {
    B <- B.init
    b <- matrix(0, nrow = n)
    beta <- rbind(B, b)  
    if(family == "binomial") mu <- exp(X %*% B)/(1 + exp(X %*% B))
    if(family == "poisson") mu <- exp(X %*% B)
    XX <- cbind(X, diag(1, nrow = n, ncol = n))
    
    est.ss <- ss
    est.B <- B
    oldest.ss <- 10^6
    oldest.B <- matrix(10^6, nrow = length(est.B))
    
    iteration <- 0
    # exitflag <- 0; rcondflag <- 0
    while (((t(est.ss - oldest.ss) %*% (est.ss - oldest.ss) > tol.pql^2) | 
            (t(est.B - oldest.B) %*% (est.B - oldest.B) > tol.pql^2)) & 
           (iteration <= maxit.pql)) {
      iteration <- iteration + 1
      oldest.ss <- est.ss
      oldest.B <- est.B
      
      est.B.m <- B
      oldest.B.m <- matrix(10^6, nrow = length(est.B))
      iteration.m <- 0
      
      # mean component
      while ((t(est.B.m - oldest.B.m) %*% (est.B.m - oldest.B.m) > tol.pql^2) & 
             (iteration.m <= maxit.pql)) {
        iteration.m <- iteration.m + 1
        oldest.B.m <- est.B.m
        
        iV <- pglmm.iV.logdetV(par = ss, Zt = Zt, St = St, mu = mu, nested = nested, logdet = FALSE, family = family, size = size)$iV
        if(family == "binomial") Z <- X %*% B + b + (Y/size - mu)/(mu * (1 - mu))
        if(family == "poisson") Z <- X %*% B + b + (Y - mu)/mu
        
        denom <- t(X) %*% iV %*% X
        num <- t(X) %*% iV %*% Z
        B <- solve(denom, matrix(num))
        B <- as.matrix(B)
        
        V = pglmm.V(par = ss, Zt = Zt, St = St, mu = mu, nested = nested, family = family, size = size)
        
        if(family == "binomial") iW <- diag(as.vector(1/(size * mu * (1 - mu))))
        if(family == "poisson") iW <- diag(as.vector(1/mu))
        C <- V - iW
        
        b <- C %*% iV %*% (Z - X %*% B)
        beta <- rbind(B, matrix(b))
        if(family == "binomial") mu <- exp(XX %*% beta)/(1 + exp(XX %*% beta))
        if(family == "poisson") mu <- exp(XX %*% beta)
        
        est.B.m <- B
        if (verbose == TRUE) show(c(iteration, B))
        # cat("mean part:", iteration.m, t(B), "\n")
        # cat("         denom", as.matrix(denom), "\n")
        # cat("         num", as.matrix(num), "\n")
        if (any(is.nan(B))) {
          stop("Estimation of B failed. Check for lack of variation in Y. You could try with a smaller s2.init, but this might not help.")
        }
      }
      # variance component
      if(family == "binomial") Z <- X %*% B + b + (Y/size - mu)/(mu * (1 - mu))
      if(family == "poisson") Z <- X %*% B + b + (Y - mu)/mu
      H <- matrix(Z - X %*% B)

      if(optimizer == "Nelder-Mead"){
        if (q > 1) {
          opt <- optim(fn = pglmm.LL, par = ss, H = H, X = X, Zt = Zt, St = St,
                       mu = mu, nested = nested, family = family, size = size, REML = REML, verbose = verbose, 
                       method = "Nelder-Mead", control = list(maxit = maxit, reltol = reltol))
        } else {
          opt <- optim(fn = pglmm.LL, par = ss, H = H, X = X, Zt = Zt, St = St,
                       mu = mu, nested = nested, family = family, size = size, REML = REML, verbose = verbose, 
                       method = "L-BFGS-B", control = list(maxit = maxit))
        }
      } else {
        if (optimizer == "bobyqa") nlopt_algor = "NLOPT_LN_BOBYQA"
        if (optimizer == "nelder-mead-nlopt") nlopt_algor = "NLOPT_LN_NELDERMEAD"
        if (optimizer == "subplex") nlopt_algor = "NLOPT_LN_SBPLX"
        opts <- list("algorithm" = nlopt_algor, "ftol_rel" = reltol, "ftol_abs" = reltol,
                     "xtol_rel" = 0.0001, "maxeval" = maxit)
        S0 <- nloptr::nloptr(x0 = ss, eval_f = pglmm.LL, opts = opts,
                             H = H, X = X, Zt = Zt, St = St, mu = mu, 
                             nested = nested, family = family, size = size, REML = REML, verbose = verbose)
        opt = list(par = S0$solution, value = S0$objective, counts = S0$iterations,
                   convergence = S0$status, message = S0$message)
      }
      ss <- abs(opt$par)
      LL <- opt$value
      convcode = opt$convergence
      niter = opt$counts
      if(verbose) cat("var part:", iteration, LL, ss, "\n")
      est.ss <- ss
      est.B <- B
    }
    row.names(B) = colnames(X)
  }
  
  # Extract parameters
  q.nonNested = dm$q.nonNested
  if (q.nonNested > 0) {
    sr <- ss[1:q.nonNested]
  } else {
    sr <- NULL
  }
  q.Nested = dm$q.Nested
  if (q.Nested > 0) {
    sn <- ss[(q.nonNested + 1):(q.nonNested + q.Nested)]
  } else {
    sn <- NULL
  }
  
  # add names to variance estimates to make sure they are in order
  if(!is.null(names(random.effects))){
    re.len <- sapply(random.effects, length)
    if(length(sr)) names(sr) <- names(random.effects)[re.len == 3]
    if(length(sn)) names(sn) <- names(random.effects)[re.len %in% c(1, 4)]
    names(ss) <- c(names(sr), names(sn))
  }
  
  s2r <- sr^2
  s2n <- sn^2
  
  if (family == 'binomial') {
    mu_hat <- exp(X %*% B) / (1 + exp(X %*% B))
    if (any(size > 1)) {
      logLik.glm <-
        sum(Y * log(mu_hat) + (size - Y) * log(1 - mu_hat) + 
              log(factorial(size) / (factorial(Y) * factorial(size - Y))))
    } else{
      logLik.glm <- sum(Y * log(mu_hat) + (1 - Y) * log(1 - mu_hat))
    }
  } else{
    mu_hat <- exp(X %*% B)
    logLik.glm <- sum(-mu_hat + Y * log(mu_hat) - log(factorial(Y)))
  }

  if(!is.null(St) && all(dim(St) == 0)) St <- NULL
  logLik <- logLik.glm + 
    as.numeric(-LL + pglmm.LL(0 * ss, H = H, X = X, Zt = Zt, St = St, mu = mu, 
                              nested = nested, REML = REML, family = family,
                              size = size, verbose = verbose))
  k <- p + q + 1
  AIC <- -2 * logLik + 2 * k
  BIC <- -2 * logLik + k * (log(n) - log(pi))

  B.cov <- solve(t(X) %*% iV %*% X)
  B.se <- as.matrix(diag(B.cov))^0.5
  B.zscore <- B/B.se
  B.pvalue <- 2 * pnorm(abs(B/B.se), lower.tail = FALSE)
  
  results <- list(formula = formula, data = data, family = family, random.effects = random.effects, 
                  B = B, B.se = B.se, B.cov = B.cov, B.zscore = B.zscore, B.pvalue = B.pvalue, 
                  ss = ss, s2n = s2n, s2r = s2r, s2resid = NULL, logLik = logLik, AIC = AIC, 
                  BIC = BIC, REML = REML, bayes = FALSE, s2.init = s2.init, B.init = B.init, Y = Y, size = size, X = X, 
                  H = as.matrix(H), iV = iV, mu = mu, nested = nested, Zt = Zt, St = St, 
                  convcode = convcode, niter = niter)
  class(results) <- c("communityPGLMM", "pglmm")
  return(results)
}

communityPGLMM.bayes <- function(formula, data = list(), family = "gaussian", 
                                 sp = NULL, site = NULL, random.effects = list(), 
                                 s2.init = NULL, B.init = NULL, 
                                 verbose = FALSE, 
                                 marginal.summ = "mean", calc.DIC = FALSE, calc.WAIC = FALSE, 
                                 prior = "inla.default",
                                 prior_alpha = 0.1, prior_mu = 1, bayes_options = NULL) {
  mf <- model.frame(formula = formula, data = data, na.action = NULL)
  X <- model.matrix(attr(mf, "terms"), data = mf)
  Y <- model.response(mf)
  p <- ncol(X)
  n <- nrow(X)
  q <- length(random.effects)
  
  if(is.matrix(Y) && ncol(Y) == 2){ # success, fails for binomial data
    Ntrials <- rowSums(Y) # total trials
    Y <- Y[, 1] # success
    # update formula
    left_side = all.vars(update(formula, .~0))[1]
    formula_bayes = as.formula(gsub(pattern = "^(cbind[(].*[)])",
                                    replacement = left_side, x = deparse(formula)))
  } else {
    formula_bayes = formula
    Ntrials = NULL
  }
  
  base_family <- gsub("zeroinflated.", "", family, fixed = TRUE)
  
  if(family == "zeroinflated.binomial") {
    family <- "zeroinflatedbinomial1"
  }
  if(family == "zeroinflated.poisson") {
    family <- "zeroinflatedpoisson1"
  }
  
  if(family == "gaussian") q <- q + 1
  
  # Compute initial estimates assuming no phylogeny if not provided
  if(family == "gaussian") {
    if (!is.null(B.init) & length(B.init) != p) {
      warning("B.init not correct length, so computed B.init using glm()")
    }
    if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p)) & !is.null(s2.init)) {
      B.init <- lm(formula = formula, data = data)$coefficients
    }
    if (!is.null(B.init) & is.null(s2.init)) {
      s2.init <- rep(var(lm(formula = formula, data = data)$residuals)/q, q)
    }
    if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p)) & is.null(s2.init)) {
      B.init <- lm(formula = formula, data = data)$coefficients
      s2.init <- rep(var(lm(formula = formula, data = data)$residuals)/q, q)
    }
  } else {
    if (!is.null(B.init) & length(B.init) != p) {
      warning("B.init not correct length, so computed B.init using glm()")
    }
    glm_bayes = glm(formula = formula, data = data, family = base_family, na.action = na.omit)
    if ((is.null(B.init) | (!is.null(B.init) & length(B.init) != p))) {
      B.init <- t(matrix(glm_bayes$coefficients, ncol = p))
    } else {
      B.init <- matrix(B.init, ncol = 1)
    }
    if (is.null(s2.init)) {
      s2.init <- rep(var(glm_bayes$residuals)/q, q)
    }
  }
  #B <- B.init
  #s <- as.vector(array(s2.init^0.5, dim = c(1, q)))
  
  s2.init <- log(1/s2.init)
  
  if(family == "gaussian") {
    resid.init <- s2.init[q]
    s2.init <- s2.init[-q]
  }
  
  if(prior == "pc.prior.auto") {
    if(family == "gaussian") {
      lmod <- lm(formula, data)
      sdres <- sd(residuals(lmod))
      pcprior <- list(prec = list(prior = "pc.prec", param = c(3 * sdres, 0.01)))
    } else {
      if(family == "binomial") {
        # lmod <- glm(formula, data = data, family = family)
        # sdres <- sd(lmod$y - lmod$fitted.values)
        pcprior <- list(prec = list(prior = "pc.prec", param = c(1, 0.1)))
      } else {
        warning("pc.prior.auto not yet implemented for this family. switching to default INLA prior...")
        prior <- "inla.default"
      }
    }
  } 
  
  if(prior == "pc.prior") {
    pcprior <- list(prec = list(prior = "pc.prec", param = c(prior_mu, prior_alpha)))
  }
  
  if(prior == "uninformative") {
    pcprior <- list(prec = list(prior = "pc.prec", param = c(100, 0.99))) 
    ## very flat prior, generally not recommended!
  }
  
  # contruct INLA formula
  inla_formula <- Reduce(paste, deparse(formula_bayes))
  inla_effects <- vector("list", length = length(random.effects))
  if(is.null(names(random.effects))){
    names(inla_effects) <- as.character(1:length(inla_effects))
  } else { # assign names so that the inla.model has names in random terms
    names(inla_effects) <- names(random.effects)
  }
  inla_Cmat <- inla_weights <- inla_reps <- inla_effects
  
  for(i in seq_along(random.effects)) {
    if(length(random.effects[[i]]) == 1) { 
      # nested term: 1|sp@site, 1|sp__@site, 1|sp@site__, 1|sp__@site__
      inla_effects[[i]] <- 1:nrow(data)
      inla_Cmat[[i]] <- solve(random.effects[[i]][[1]])
    } else if(length(random.effects[[i]]) == 2) { 
      # nested term: x|sp@site, x|sp__@site, x|sp@site__, x|sp__@site__
      inla_effects[[i]] <- 1:nrow(data)
      inla_weights[[i]] <- random.effects[[i]][[1]]
      inla_Cmat[[i]] <- solve(random.effects[[i]][[2]])
    } else if(length(random.effects[[i]]) == 3) { 
      # non-nested term: e.g. 1|sp__, x|sp__
      inla_effects[[i]] <- as.numeric(random.effects[[i]][[2]])
      inla_Cmat[[i]] <- solve(random.effects[[i]][[3]])
      inla_weights[[i]] <- random.effects[[i]][[1]]
    } else { # nested term: 1|sp@site, 1|sp__@site, 1|sp@site__, 1|sp__@site__
      inla_effects[[i]] <- as.numeric(random.effects[[i]][[2]])
      inla_Cmat[[i]] <- solve(random.effects[[i]][[3]])
      inla_weights[[i]] <- random.effects[[i]][[1]]
      inla_reps[[i]] <- as.numeric(random.effects[[i]][[4]])
    }
  }
  
  if(!is.null(bayes_options$diagonal)) {
    diagonal <- bayes_options$diagonal
    bayes_options <- bayes_options[-which(names(bayes_options) == "diagonal")]
    if(length(bayes_options) == 0) {
      bayes_options <- NULL
    }
  } else {
    diagonal <- NULL
  }
  
  f_form = vector(mode = "character", length = length(random.effects))
  for(i in seq_along(random.effects)) {
    if(length(random.effects[[i]]) == 3) { # non-nested term
      if(length(random.effects[[i]][[1]]) == 1) { # 1 | sp, 1 | sp__, 1 | site, 1 | site__
        f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
      } else {  # x | sp, x | sp__, x | site, x | site__
        f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], inla_weights[[", i, "]], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
      }
    } else { # nested term 1 | sp__@site, etc.
      if(length(random.effects[[i]]) == 4) { 
        if(length(random.effects[[i]][[1]]) == 1) {
          f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], replicate = inla_reps[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
        } else {
          f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], replicate = inla_reps[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
        }
      } else { # length of 1 or 2: specified as a matrix (1|sp__@site) or list of 2 (x|sp__@site)
        if(length(random.effects[[i]]) == 1) { # (1|sp__@site) etc.
          f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
        } else {
          if(length(random.effects[[i]]) == 2) { # (x|sp__@site) etc.
            f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], inla_weights[[", i, "]], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
          } else { # other lengths? just in case ...
            f_form[i] <- paste0("f(inla_effects[['", names(inla_effects)[i], "']], model = 'generic0', constr = TRUE, Cmatrix = inla_Cmat[[", i, "]], initial = s2.init[", i, "], diagonal = diagonal)")
          }
        }
      }
    }
  }
  
  if(prior != "inla.default"){
    f_form = unname(sapply(f_form, function(x){
      gsub(pattern = "[)]$", replacement = ", hyper = pcprior)", x)
    }))
  }
  
  f_form = paste(f_form, collapse = " + ")
  inla_formula <- paste(inla_formula, f_form, sep = " + ")
  
  if(calc.DIC) {
    if(calc.WAIC) {
      control.compute <- list(dic = TRUE, waic = TRUE)
    } else {
      control.compute <- list(dic = TRUE)
    }
  } else {
    if(calc.WAIC) {
      control.compute <- list(waic = TRUE)
    } else {
      control.compute <- list()
    }
  }
  
  argus <- c(list(formula = as.formula(inla_formula),
                  data = data,
                  verbose = verbose,
                  family = family), 
             bayes_options)
  if(is.null(argus$control.fixed)) {
    argus$control.fixed = list(prec.intercept = 0.0001, correlation.matrix = TRUE)
  } else {
    if(is.null(argus$control.fixed$prec.intercept)) {
      argus$control.fixed$prec.intercept <- 0.0001
    }
    if(is.null(argus$control.fixed$correlation.matrix)) {
      argus$control.fixed$correlation.matrix <- TRUE
    }
  }
  
  if(is.null(argus$control.compute$dic)) {
    argus$control.compute$dic <- calc.DIC
  }
  if(is.null(argus$control.compute$waic)) {
    argus$control.compute$waic <- calc.WAIC
  }
  if(is.null(argus$control.compute$config)) {
    argus$control.compute$config <- TRUE
  }
  
  if(is.null(argus$control.predictor)) {
    argus$control.predictor = list(compute = TRUE, link = 1)
  } else {
    if(is.null(argus$control.predictor$compute)) {
      argus$control.predictor$compute <- TRUE
    }
    if(is.null(argus$control.predictor$link)) {
      argus$control.predictor$link <- 1
    }
  }
  
  if(family == "gaussian") {
    if(is.null(argus$control.family)) {
      argus$control.family = list(hyper = list(prec = list(initial = resid.init)))
    } else {
      if(is.null(argus$control.family$hyper)) {
        argus$control.family$hyper <- list(prec = list(initial = resid.init))
      }
    }    
    
    # out <- INLA::inla(as.formula(inla_formula), data = data,
    #                   verbose = verbose,
    #                   control.family = list(hyper = list(prec = list(initial = resid.init))),
    #                   control.fixed = list(prec.intercept = 0.0001, correlation.matrix = TRUE),
    #                   control.compute = control.compute,
    #                   control.predictor = list(compute = TRUE))
    
  } else {
    argus$Ntrials <- Ntrials
  }
  
  out <- do.call(INLA::inla, argus)
  
  # out <- INLA::inla(as.formula(inla_formula), data = data,
  #                       verbose = verbose,
  #                       family = family,
  #                       control.fixed = list(prec.intercept = 0.0001, correlation.matrix = TRUE),
  #                       control.compute = control.compute,
  #                       control.predictor=list(compute = TRUE),
  #                       Ntrials = Ntrials)
  #summary(out)
  #print(out$summary.fitted.values)
  
  if(calc.DIC) {
    DIC <- out$dic$dic
  } else {
    DIC <- NULL
  }
  
  if(calc.WAIC){
    WAIC <- out$waic$waic
  } else {
    WAIC <- NULL
  }
  
  if(marginal.summ == "median") marginal.summ <- "0.5quant"
  
  nested <- sapply(random.effects, length) %in% c(1, 2, 4)
  
  variances <- 1/out$summary.hyperpar[ , marginal.summ]
  # names(variances) <- gsub("Precision for ", "", rownames(out$summary.hyperpar))
  variances.ci <- 1/out$summary.hyperpar[ , c("0.975quant", "0.025quant")]
  
  if(family == "gaussian") {
    resid_var <- variances[1]
    names(resid_var) <- "residual"
    variances <- variances[-1]
    resid_var.ci <- variances.ci[1, ]
    variances.ci <- variances.ci[-1, ]
  } else {
    resid_var <- NULL
    resid_var.ci <- NULL
  }
  
  if(grepl("zeroinflated", family)) {
    zeroinlated_param <- 1/variances[1]
    variances <- variances[-1]
    zeroinflated_param.ci <- rev(1/variances.ci[1, ])
    variances.ci <- variances.ci[-1, ]
  } else {
    zeroinlated_param <- NULL
    zeroinflated_param.ci <- NULL
  }
  
  if(!is.null(names(random.effects))){
    names(variances) <- names(random.effects)
    row.names(variances.ci) <- paste("Precision for", names(random.effects))
  }
 
  std.vars <- variances^0.5
  ss <- c(std.vars[!nested], std.vars[nested], resid_var)
  
  if(marginal.summ == "median") marginal.summ <- "0.5quant"
  
  B <- out$summary.fixed[ , marginal.summ]
  H <- Y - out$summary.fitted.values[ , marginal.summ, drop = TRUE]
  mu <- out$summary.fitted.values[ , marginal.summ, drop = FALSE]
  #H <- NULL
  
  results <- list(formula = formula, data = data, family = family, random.effects = random.effects, 
                  B = B, B.se = NULL,
                  B.ci = out$summary.fixed[ , c("0.025quant", "0.975quant")],
                  B.cov = out$misc$lincomb.derived.correlation.matrix, B.zscore = NULL, 
                  B.pvalue = NULL, ss = ss, s2n = variances[nested], s2r = variances[!nested],
                  s2resid = resid_var, zi = zeroinlated_param, s2n.ci = variances.ci[nested, ], 
                  s2r.ci = variances.ci[!nested, ], s2resid.ci = resid_var.ci,
                  zi.ci = zeroinflated_param.ci,
                  logLik = out$mlik[1, 1], AIC = NULL, BIC = NULL, DIC = DIC, WAIC = WAIC,
                  REML = NULL, bayes = TRUE, marginal.summ = marginal.summ, 
                  s2.init = s2.init, B.init = B.init, Y = Y, X = X, H = H, 
                  iV = NULL, mu = mu, nested = nested, Zt = NULL, St = NULL, 
                  convcode = NULL, niter = NULL, inla.model = out)
  class(results) <- c("communityPGLMM", "pglmm")
  results
}

#' @export
#' @rdname pglmm
communityPGLMM <- pglmm # to be compatible with old code