knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  warning = FALSE,
  fig.width=5, fig.height=5,
  fig.align = "center",
  dev = "png",
  fig.pos = 'H'
  )

Introduction to gllvm

R package gllvm

# From CRAN
install.packages(gllvm)
# OR
# From GitHub using devtools package's function install_github
devtools::install_github("JenniNiku/gllvm")

Problems?

gllvm package depends on R packages TMB and mvabund, try to install these first.

Distributions

| Response | Distribution | Method | Link | | ----------- |:------------:|:------- |:------- | |Counts | Poisson | VA/LA |log | | | NB | VA/LA |log | | | ZIP | LA |log | |Binary | Bernoulli | VA/LA |probit | | | | EVA/LA |logit | |Ordinal | Ordinal | VA |probit | |Normal | Gaussian | VA/LA |identity| |Positive continuous| Gamma | VA/LA |log| |Non-negative continuous| Exponential | VA/LA |log| |Biomass | Tweedie | LA/EVA |log | |Percent cover| beta | LA/EVA |probit/logit |

Data input

Main function of the gllvm package is gllvm(), which can be used to fit GLLVMs for multivariate data with the most important arguments listed in the following:

gllvm(y = NULL, X = NULL, TR = NULL, family, num.lv = 2, 
 formula = NULL, method = "VA", row.eff = FALSE, n.init=1, starting.val ="res", ...)
library(gllvm)

Example: Spiders

Data fitting

Fitting basic GLLVM $g(E(y_{ij})) = \beta_{0j} + \boldsymbol{u}_i'\boldsymbol{\theta}_j$ with gllvm:

library(mvabund)
data("spider")
library(gllvm)
fitnb <- gllvm(y = spider$abund, family = "negative.binomial", num.lv = 2)
fitnb

Residual analysis

par(mfrow = c(1,2))
plot(fitnb, which = 1:2)

Model selection

fitp <- gllvm(y = spider$abund, family = poisson(), num.lv = 2)
fitnb <- gllvm(y = spider$abund, family = "negative.binomial", num.lv = 2)
AIC(fitp)
AIC(fitnb)

Exercises

Try to do these exercises for the next 10 minutes, as many as time is enough for.

E1. Load spider data from mvabund package and take a look at the dataset.

library(gllvm)
#Package **mvabund** is loaded with **gllvm** so just load with a function `data()`.
data("spider")
# more info: 
# ?spider

Show the answers.

1. Print the data and covariates and draw a boxplot of the data.

# response matrix:
spider$abund
# Environmental variables
spider$x
# Plot data using boxplot:
boxplot(spider$abund)

E2. Fit GLLVM to spider data with a suitable distribution. Data consists of counts of spider species.

# Take a look at the function documentation for help: 
?gllvm

Show the answers.

2. Response variables in spider data are counts, so Poisson, negative binomial and zero inflated Poisson are possible. However, ZIP is implemented only with Laplace method, so it need to be noticed, that if models are fitted with different methods they can not be compared with information criteria. Let's try just with a Poisson and NB. NOTE THAT the results may not be exactly the same as below, as the initial values for each model fit are slightly different, so the results may also differ slightly.

# Fit a GLLVM to data
fitp <- gllvm(y=spider$abund, family = poisson(), num.lv = 2)
fitp
fitnb <- gllvm(y=spider$abund, family = "negative.binomial", num.lv = 2)
fitnb

Based on AIC, NB distribution suits better. How about residual analysis: NOTE THAT The package uses randomized quantile residuals so each time you plot the residuals, they look a little different.

# Fit a GLLVM to data
plot(fitp)
plot(fitnb)

You could do these comparisons with Laplace method as well, using the code below, and it would give the same conclusion that NB distribution suits best:

fitLAp <- gllvm(y=spider$abund, family = poisson(), method = "LA", num.lv = 2)
fitLAnb <- gllvm(y=spider$abund, family = "negative.binomial", method = "LA", num.lv = 2)
fitLAzip <- gllvm(y=spider$abund, family = "ZIP", method = "LA", num.lv = 2)
AIC(fitLAp)
AIC(fitLAnb)
AIC(fitLAzip)

E3. Explore the fitted model. Where are the estimates for parameters? What about predicted latent variables? Standard errors?

Show the answers.

3. Lets explore the fitted model:

# Parameters:
coef(fitnb)
# Where are the predicted latent variable values? just fitp$lvs or
getLV(fitnb)
# Standard errors for parameters:
fitnb$sd

E4. Fit model with different numbers of latent variables.

Show the answers.

4. Default number of latent variables is 2. Let's try 1 and 3 latent variables as well:

# In exercise 2, we fitted GLLVM with two latent variables 
fitnb
# How about 1 or 3 LVs
fitnb1 <- gllvm(y=spider$abund, family = "negative.binomial", num.lv = 1)
fitnb1
getLV(fitnb1)
fitnb3 <- gllvm(y=spider$abund, family = "negative.binomial", num.lv = 3)
fitnb3
getLV(fitnb3)

E5. Include environmental variables to the GLLVM and explore the model fit.

Show the answers.

5. Environmental variables can be included with an argument X:

fitnbx <- gllvm(y = spider$abund, X = spider$x, family = "negative.binomial", seed = 123, num.lv = 2)
fitnbx
coef(fitnbx)
# confidence intervals for parameters:
confint(fitnbx)

Problems? See hints:

I have problems in model fitting. My model converges to infinity or local maxima: GLLVMs are complex models where starting values have a big role. Choosing a different starting value method (see argument starting.val) or use multiple runs and pick up the one giving highest log-likelihood value using argument n.init. More variation to the starting points can be added with jitter.var.

My results does not look the same as in answers: The results may not be exactly the same as in the answers, as the initial values for each model fit are slightly different, so the results may also differ slightly.

Ordination

GLLVM as a model based ordination method

Ordination plot

par(mfrow=c(1,1))
ordiplot(fitnb, predict.region = TRUE, ylim=c(-2.5,2.5), xlim=c(-2,3))

Biplot

ordiplot(fitnb, biplot = TRUE)
abline(h = 0, v = 0, lty=2)

Environmental gradients

# Arbitrary color palette, a vector length of 20. Can use, for example, colorRampPalette from package grDevices
rbPal <- c("#00FA9A", "#00EC9F", "#00DFA4", "#00D2A9", "#00C5AF", "#00B8B4", "#00ABB9", "#009DBF", "#0090C4", "#0083C9", "#0076CF", "#0069D4", "#005CD9", "#004EDF", "#0041E4", "#0034E9", "#0027EF", "#001AF4", "#000DF9", "#0000FF")
X <- spider$x
par(mfrow = c(3,2), mar=c(4,4,2,2))
for(i in 1:ncol(X)){
Col <- rbPal[as.numeric(cut(X[,i], breaks = 20))]
ordiplot(fitnb, symbols = T, s.colors = Col, main = colnames(X)[i], 
         biplot = TRUE)
}


JenniNiku/gllvm documentation built on May 3, 2024, 2:15 a.m.