Description Usage Arguments Details Value Note Author(s) References See Also Examples
Fits generalized linear latent variable model for multivariate data. The model can be fitted using Laplace approximation method or variational approximation method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38  gllvm(
y = NULL,
X = NULL,
TR = NULL,
data = NULL,
formula = NULL,
lv.formula = NULL,
num.lv = NULL,
num.lv.c = 0,
num.RR = 0,
family,
row.eff = FALSE,
offset = NULL,
quadratic = FALSE,
sd.errors = TRUE,
method = "VA",
randomX = NULL,
dependent.row = FALSE,
beta0com = FALSE,
zeta.struc = "species",
plot = FALSE,
link = "probit",
dist = 0,
corWithin = FALSE,
Power = 1.1,
seed = NULL,
scale.X = TRUE,
return.terms = TRUE,
gradient.check = FALSE,
control = list(reltol = 1e10, TMB = TRUE, optimizer = "optim", max.iter = 2000,
maxit = 4000, trace = FALSE, optim.method = NULL),
control.va = list(Lambda.struc = "unstructured", Ab.struct = "unstructured",
diag.iter = 1, Ab.diag.iter = 0, Lambda.start = c(0.3, 0.3, 0.3)),
control.start = list(starting.val = "res", n.init = 1, jitter.var = 0, start.fit =
NULL, start.lvs = NULL, randomX.start = "zero", quad.start = 0.01, start.struc =
"LV"),
...
)

y 
(n x m) matrix of responses. 
X 
matrix or data.frame of environmental covariates. 
TR 
matrix or data.frame of trait covariates. 
data 
data in long format, that is, matrix of responses, environmental and trait covariates and row index named as "id". When used, model needs to be defined using formula. This is alternative data input for y, X and TR. 
formula 
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted (for fixedeffects predictors). 
lv.formula 
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted (for latent variables). 
num.lv 
number of latent variables, d, in gllvm model. Nonnegative integer, less than number of response variables (m). Defaults to 0. 
num.lv.c 
number of latent variables, d, in gllvm model to constrain, with residual term. Nonnegative integer, less than number of response (m) and equal to, or less than, the number of predictor variables (k). Defaults to 0. Requires specification of "lv.formula" in combination with "X" or "datayx". Can be used in combination with num.lv and fixedeffects, but not with traits. 
num.RR 
number of latent variables, d, in gllvm model to constrain, without residual term (reduced rank regression). Cannot yet be combined with traits. 
family 
distribution function for responses. Options are 
row.eff 

offset 
vector or matrix of offset terms. 
quadratic 
either 
sd.errors 
logical. If 
method 
model can be fitted using Laplace approximation method ( 
randomX 
formula for species specific random effects of environmental variables in fourth corner model. Defaults to 
dependent.row 
logical, whether or not random row effects are correlated (dependent) with the latent variables. Defaults to 
beta0com 
logical, if 
zeta.struc 
Structure for cutoffs in the ordinal model. Either "common", for the same cutoffs for all species, or "species" for speciesspecific cutoffs. For the latter, classes are arbitrary per species, each category per species needs to have at least one observations. Defaults to "species". 
plot 
logical, if 
link 
link function for binomial family if 
dist 
coordinates or time points used for row parameters correlation structure 
corWithin 
logical. If 
Power 
fixed power parameter in Tweedie model. Scalar from interval (1,2). Defaults to 1.1. 
seed 
a single seed value, defaults to 
scale.X 
if 
return.terms 
logical, if 
gradient.check 
logical, if 
control 
A list with the following arguments controlling the optimization:

control.va 
A list with the following arguments controlling the variational approximation method:

control.start 
A list with the following arguments controlling the starting values:

... 
Not used. 
Fits generalized linear latent variable models as in Hui et al. (2015 and 2017) and Niku et al. (2017). Method can be used with two types of latent variable models depending on covariates. If only site related environmental covariates are used, the expectation of response Y_{ij} is determined by
g(μ_{ij}) = η_{ij} = α_i + β_{0j} + x_i'β_j + u_i'θ_j,
where g(.) is a known link function, u_i are dvariate latent variables (d<<m), α_i is an optional row effect at site i, and it can be fixed or random effect (also other structures are possible, see below), β_{0j} is an intercept term for species j, β_j and θ_j are column specific coefficients related to covariates and the latent variables, respectively.
Alternatively, a more complex version of the model can be fitted with quadratic = TRUE
, where species are modeled as a quadratic function of the latent variables:
g(μ_{ij}) = η_{ij} = α_i + β_{0j} + x_i'β_j + u_i'θ_j  u_i' D_j u_i
. Here, D_j is a diagonal matrix of positive only quadratic coefficients, so that the model generates concave shapes only. This implementation follows the ecological theoretical model where species are generally recognized to exhibit nonlinear response curves. For a model with quadratic responses, quadratic coefficients are assumed to be the same for all species:
D_j = D
. This model requires less information
per species and can be expected to be more applicable to most datasets. The quadratic coefficients D can be used to calculate the length of
ecological gradients.
For quadratic responses, it can be useful to provide the latent variables estimated with a GLLVM with linear responses, or estimated with (Detrended) Correspondence Analysis.
The latent variables can then be passed to the start.lvs
argument inside the control.start
list, which in many cases gives good results.
For GLLVMs with both linear and quadratic response model, the latent variable can be constrained to a series of covariates x_lv:
g(μ_{ij}) = α_i + β_{0j} + x_i'β_j + (z_i+X_lvβ_lv)' γ_j  (z_i+X_lvβ_lv)' D_j (z_i+X_lvβ_lv) + u_i'θ_j  u_i' D_j u_i ,
where z_i+X_lvβ_lv are constrained latent variables, which account for variation that can be explained by some covariates X_lv after accounting for the effects of covariates included in the fixedeffects part of the model X, and u_i are unconstrained latent variables that account for any remaining residual variation.
An alternative model is the fourth corner model (Brown et al., 2014, Warton et al., 2015) which will be fitted if also trait covariates are included. The expectation of response Y_{ij} is
g(μ_{ij}) = α_i + β_{0j} + x_i'(β_x + b_j) + TR_j'β_t + vec(B)*kronecker(TR_j,X_i) + u_i'θ_j  u_i'D_ju_i
where g(.), u_i, β_{0j} and θ_j are defined as above. Vectors β_x and β_t are the main effects or coefficients related to environmental and trait covariates, respectively, matrix B includes interaction terms. Vectors b_j are optional speciesspecific random slopes for environmental covariates. The interaction/fourth corner terms are optional as well as are the main effects of trait covariates.
In addition to the sitespecific random effects, α_i, it is also possible to set arbitrary structure/design for the row effects.
That is, assume that observations / rows i=1,...,n in the data matrix are from groups t=1,...,T, so that each row i belongs to one of the groups, denote G(i) \in \{1,...,T\}. Each group t has a number of observations n_t, so that ∑_{t=1}^{T} n_t =n.
Now we can set random intercept for each group t, (see argument 'row.eff
'):
g(μ_{ij}) = η_{ij} = α_{G(i)} + β_{0j} + x_i'β_j + u_i'θ_j,
There is also a possibility to set correlation structure for the random intercepts between groups, so that (α_{1},...,α_{T})^\top \sim N(0, Σ_r). That might be the case, for example, when the groups are spatially or temporally dependent.
Another option is to set row specific random intercepts α_i, but to set the correlation structure for the observations within groups, (see argument 'corWithin
'). That is, we can set corr(α_{i},α_{i'}) = C(i,i') \neq 0 according to some correlation function C, when G(i)=G(i').
This model is restricted to the case, where each group has equal number of observations (rows), that is n_t=n_{t'} for all t,t' \in \{1,...,T\}.
The correlation structures available in the package are
corAR1
autoregressive process of order 1.
corExp
exponentially decaying, see argument 'dist
'.
corCS
compound symmetry.
The method is sensitive for the choices of initial values of the latent variables. Therefore it is
recommendable to use multiple runs and pick up the one giving the highest loglikelihood value (see argument 'n.init
').
However, sometimes this is computationally too demanding, and default option
starting.val = "res"
is recommended. For more details on different starting value methods, see Niku et al., (2018).
Models are implemented using TMB (Kristensen et al., 2015) applied to variational approximation (Hui et al., 2017), extended variational approximation (Korhonen et al., 2021) and Laplace approximation (Niku et al., 2017).
With ordinal family response classes must start from 0 or 1.
Mean and variance for distributions are defined as follows.
For count data family = poisson()
: Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}, or
family = "negative.binomial"
: Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}+μ_{ij}^2φ_j, or
family = "ZIP"
: Expectation E[Y_{ij}] = (1p)μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1p)(1+μ_{ij}p).
For binary data family = binomial()
: Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1μ_{ij}).
For percent cover data 0 < Y_{ij} < 1 family = "beta"
: Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1μ_{ij})//(1+φ_j).
For positive continuous data family = "gamma"
:Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}^2/φ_j, where φ_j is species specific shape parameter.
For nonnegative continuous data family = "exponential"
:Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}^2.
For nonnegative continuous or biomass datafamily = "tweedie"
Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = φ_j*μ_{ij}^ν, where ν is a power parameter of Tweedie distribution. See details Dunn and Smyth (2005).
For ordinal data family = "ordinal"
: Cumulative probit model, see Hui et.al. (2016).
For normal distributed data family = gaussian()
: Expectation E[Y_{ij}] = μ_{ij}, variance V(y_{ij}) = φ_j^2.
An object of class "gllvm" includes the following components:
call 
function call 
logL 
log likelihood 
lvs 
latent variables 
params 
list of parameters

Power 
power parameter ν for Tweedie family 
sd 
list of standard errors of parameters 
prediction.errors 
list of prediction covariances for latent variables and variances for random row effects when method 
A, Ar 
covariance matrices for variational densities of latent variables and variances for random row effects 
If function gives warning: 'In f(x, order = 0) : value out of range in 'lgamma”, optimizer have visited an area where gradients become too big. It is automatically fixed by trying another step in the optimization process, and can be ignored if errors do not occur.
Jenni Niku <jenni.m.e.niku@jyu.fi>, Wesley Brooks, Riki Herliansyah, Francis K.C. Hui, Sara Taskinen, David I. Warton, Bert van der Veen
Brown, A. M., Warton, D. I., Andrew, N. R., Binns, M., Cassis, G., and Gibb, H. (2014). The fourthcorner solution  using predictive models to understand how species traits interact with the environment. Methods in Ecology and Evolution, 5:344352.
Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of tweedie exponential dispersion model densities. Statistics and Computing, 15:267280.
Hui, F. K. C., Taskinen, S., Pledger, S., Foster, S. D., and Warton, D. I. (2015). Modelbased approaches to unconstrained ordination. Methods in Ecology and Evolution, 6:399411.
Hui, F. K. C., Warton, D., Ormerod, J., Haapaniemi, V., and Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics, 26:3543.
Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 121.
Korhonen, P., Hui, F. K. C., Niku, J., and Taskinen, S. (2021). Fast, universal estimation of latent variable models using extended variational approximations. Submitted.
Niku, J., Warton, D. I., Hui, F. K. C., and Taskinen, S. (2017). Generalized linear latent variable models for multivariate count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics, 22:498522.
Niku, J., Brooks, W., Herliansyah, R., Hui, F. K. C., Taskinen, S., and Warton, D. I. (2018). Efficient estimation of generalized linear latent variable models. PLoS One, 14(5):120.
Warton, D. I., Guillaume Blanchet, F., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C. and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology & Evolution, 30:766779.
coefplot.gllvm
, confint.gllvm
, ordiplot.gllvm
, plot.gllvm
, summary.gllvm
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121  # Extract subset of the microbial data to be used as an example
data(microbialdata)
X < microbialdata$Xenv
y < microbialdata$Y[, order(colMeans(microbialdata$Y > 0),
decreasing = TRUE)[21:40]]
fit < gllvm(y, X, formula = ~ pH + Phosp, family = poisson())
fit$logL
ordiplot(fit)
coefplot(fit)
## Load a dataset from the mvabund package
library(mvabund)
data(antTraits)
y < as.matrix(antTraits$abund)
X < as.matrix(antTraits$env)
TR < antTraits$traits
# Fit model with environmental covariates Bare.ground and Shrub.cover
fit < gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
family = poisson())
ordiplot(fit)
coefplot(fit)
## Example 1: Fit model with two unconstrained latent variables
# Using variational approximation:
fitv0 < gllvm(y, family = "negative.binomial", method = "VA")
ordiplot(fitv0)
plot(fitv0, mfrow = c(2,2))
summary(fitv0)
confint(fitv0)
## Example 1a: Fit model with two constrained latent variables and with
# quadratic response model
# We scale and centre the predictors to improve convergence
fity1 < gllvm(y, X = scale(X), family = "negative.binomial",
num.lv.c=2, method="VA")
ordiplot(fity1, biplot = TRUE)
# Using Laplace approximation: (this line may take about 30 sec to run)
fitl0 < gllvm(y, family = "negative.binomial", method = "LA")
ordiplot(fitl0)
# Poisson family:
fit.p < gllvm(y, family = poisson(), method = "LA")
ordiplot(fit.p)
# Use poisson model as a starting parameters for ZIPmodel, this line
# may take few minutes to run
fit.z < gllvm(y, family = "ZIP", method = "LA",
control.start = list(start.fit = fit.p))
ordiplot(fit.z)
## Example 2: gllvm with environmental variables
# Fit model with two latent variables and all environmental covariates,
fitvX < gllvm(formula = y ~ X, family = "negative.binomial")
ordiplot(fitvX, biplot = TRUE)
coefplot(fitvX)
# Fit model with environmental covariates Bare.ground and Shrub.cover
fitvX2 < gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
family = "negative.binomial")
ordiplot(fitvX2)
coefplot(fitvX2)
# Use 5 initial runs and pick the best one
fitvX_5 < gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
family = "negative.binomial", control.start=list(n.init = 5, jitter.var = 0.1))
ordiplot(fitvX_5)
coefplot(fitvX_5)
## Example 3: Data in long format
# Reshape data to long format:
datalong < reshape(data.frame(cbind(y,X)), direction = "long",
varying = colnames(y), v.names = "y")
head(datalong)
fitvLong < gllvm(data = datalong, formula = y ~ Bare.ground + Shrub.cover,
family = "negative.binomial")
## Example 4: Fourth corner model
# Fit fourth corner model with two latent variables
fitF1 < gllvm(y = y, X = X, TR = TR, family = "negative.binomial")
coefplot(fitF1)
# Fourth corner can be plotted also with next lines
#fourth = fitF1$fourth.corner
#library(lattice)
#a = max( abs(fourth) )
#colort = colorRampPalette(c("blue","white","red"))
#plot.4th = levelplot(t(as.matrix(fourth)), xlab = "Environmental Variables",
# ylab = "Species traits", col.regions = colort(100),
# at = seq( a, a, length = 100), scales = list( x = list(rot = 45)))
#print(plot.4th)
# Specify model using formula
fitF2 < gllvm(y = y, X = X, TR = TR,
formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
family = "negative.binomial")
ordiplot(fitF2)
coefplot(fitF2)
## Include species specific random slopes to the fourth corner model
fitF3 < gllvm(y = y, X = X, TR = TR,
formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
family = "negative.binomial", randomX = ~ Bare.ground + Canopy.cover,
control.start = list(n.init = 3))
ordiplot(fitF3)
coefplot(fitF3)
## Example 5: Fit Tweedie model
# Load coral data
data(tikus)
ycoral < tikus$abund
# Let's consider only years 1981 and 1983
ycoral < ycoral[((tikus$x$time == 81) + (tikus$x$time == 83)) > 0, ]
# Exclude species which have observed at less than 4 sites
ycoral < ycoral[17, (colSums(ycoral > 0) > 4)]
# Fit Tweedie model for coral data (this line may take few minutes to run)
fit.twe < gllvm(y = ycoral, family = "tweedie", method = "LA")
ordiplot(fit.twe)
## Example 6: Random row effects
fitRand < gllvm(y, family = "negative.binomial", row.eff = "random")
ordiplot(fitRand, biplot = TRUE)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.