create.life: Simulate a Multivariate Response Matrix
In emitanaka/boral: Bayesian Ordination and Regression AnaLysis

Description Usage Arguments Details Value Author(s) See Also Examples

Simulate a multivariate response matrix, given parameters such as but not necessarily all of: family, number of latent variables and related coefficients, an matrix of explanatory variables and related coefficients, row effects, cutoffs for cumulative probit regression of ordinal responses.

 
create.life(true.lv = NULL, lv.coefs, 
    lv.control = list(num.lv = 0, type = "independent", 
    lv.covparams = NULL, distmat = NULL),
     X = NULL, X.coefs = NULL, 
     traits = NULL, traits.coefs = NULL, family, row.eff = "none", 
     row.params = NULL, row.ids = NULL, offset = NULL, 
     trial.size = 1, cutoffs = NULL, powerparam = NULL, 
     manual.dim = NULL, save.params = FALSE)

## S3 method for class 'boral'
simulate(object, nsim = 1, seed = NULL, new.lvs = FALSE, distmat = NULL, 
     est = "median", ...)

`object`	An object of class "boral".
`nsim`	Number of multivariate response matrices to simulate. Defaults to 1.
`seed`	Seed for dataset simulation. Defaults to `NULL`, in which case no seed is set.
`new.lvs`	If `FALSE`, then true latent variables are obtained from the `object`. If `TRUE`, then new true latent variables are generated.
`distmat`	A distance matrix required to calculate correlations across sites when a non-independent correlation structure on the latent variables is imposed, when `new.lvs = TRUE` and `object$lv.type != "independent"`.
`est`	A choice of either the posterior median (`est == "median"`) or posterior mean (`est == "mean"`), which are then treated as estimates and the fitted values are calculated from. Default is posterior median.
`true.lv`	A matrix of true latent variables. With multivariate abundance data in ecology for instance, each row corresponds to the true site ordination coordinates. If supplied, then simulation is based of these true latent variables. If `NULL`, then the function looks to the argument `lv.control` to see what to do. Defaults to `NULL`.
`lv.coefs`	A matrix containing column-specific intercepts, latent variable coefficients relating to `true.lv`, and dispersion parameters.
`lv.control`	This argument is utilized if `true.lv = NULL`, in which case the function uses this argument to determine how to simulate new, true latent variables. A list (currently) with the following arguments: num.lv: which specifies the number of true latent variables to generate. Defaults to 0. type: which specifies the type the correlation structure of the latent variables (across sites). Defaults to independence correlation structure. lv.covparams: which is a vector containing one or two elements required if parameterizing a non-independence spatial correlation structure of the latent variables. distmat: which a distance matrix required to calculate correlations across sites when a non-independence correlation structure on the latent variables is imposed. Please see `about.lvs` for more information.
`X`	An model matrix of covariates, which can be included as part of the data generation. Defaults to `NULL`, in which case no model matrix is used. No intercept column should be included in `X`.
`X.coefs`	The coefficients relating to `X`. Defaults to `NULL`. This argument needs to be supplied if `X` is supplied and no `traits` are supplied.
`traits`	A model matrix of species traits, which can be included as part of the model. Defaults to `NULL`, in which case no matrix was used. No intercept column should be included in `traits`, as it is included automatically.
`traits.coefs`	A matrix of coefficients that are used to generate "new" column-specific intercepts and `X.coefs`. The number of rows should equal to (`ncol(X)+1`) and the number of columns should equal to (`ncol(traits)`+2). How this argument works is as follows: when both `traits` and `traits.coefs` are supplied, then new column-specific intercepts (i.e. the first column of `lv.coefs` is overwritten) are generated by simulating from a normal distribution with mean equal to `crossprod(c(1,traits), traits.coefs[1,1:(ncol(traits.coefs)-1)])` and standard deviation `traits.coefs[1,ncol(traits.coefs)]`. In other words, the last column of `trait.coefs` provides the standard deviation of the normal distribution, with the other columns being the regression coefficients in the mean of the normal distribution. Analogously, new `X.coefs` are generated in the same manner using the remaining rows of `trait.coefs`. Please see `about.traits` for more information. It is important that highlight then with in this data generation mechanism, the new column-specific intercepts and `X.coefs` are now random effects, being drawn from a normal distribution. Defaults to `NULL`, in conjuction with `traits = NULL`.
`family`	Either a single element, or a vector of length equal to the number of columns in `y`. The former assumes all columns of `y` come from this distribution. The latter option allows for different distributions for each column of `y`. Elements can be one of "binomial" (with probit link), "poisson" (with log link), "negative.binomial" (with log link), "normal" (with identity link), "lnormal" for lognormal (with log link), "tweedie" (with log link), "exponential" (with log link), "gamma" (with log link), "beta" (with logit link), "ordinal" (cumulative probit regression). Please see `about.distributions` for information on distributions available in boral overall.
`row.eff`	Single element indicating whether row effects are included as fixed effects ("fixed"), random effects ("random") or not included ("none") in the fitted model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and standard deviation given by `row.params`. Defaults to "none".
`row.params`	Parameters corresponding to the row effect from the fitted model. If `row.eff = "fixed"`, then these are the fixed effects and should have length equal to the number of columns in `y`. If `row.eff = "random"`, then this is the standard deviation for the random effects normal distribution. If `row.eff = "none"`, then this argument is ignored.
`row.ids`	A matrix with the number of rows equal to the number of rows in `y`, and the number of columns equal to the number of row effects to be included in the model. Element (i,j) indicates to the cluster ID of row i in `y` for random effect eqnj; please see `boral` for details. Defaults to `NULL`, so that if `row.params = NULL` then the argument is ignored, otherwise if `row.params` is supplied then `row.ids = matrix(1:nrow(y), ncol = 1)` i.e., a single, row effect unique to each row. An internal check is done to see `row.params` and `row.ids` are consistent in terms of arguments supplied.
`offset`	A matrix with the same dimensions as the response matrix `y`, specifying an a-priori known component to be included in the linear predictor during fitting. Defaults to `NULL`.
`trial.size`	Either equal to a single element, or a vector of length equal to the number of columns in y. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowed in each column of y. The argument is ignored for all columns not assumed to be binomially distributed. Defaults to 1, i.e. Bernoulli distribution.
`cutoffs`	A vector of common common cutoffs for proportional odds regression when any of `family` is ordinal. They should be increasing order. Defaults to `NULL`.
`powerparam`	A common power parameter for tweedie regression when any of `family` is tweedie. Defaults to `NULL`.
`manual.dim`	A vector of length 2, containing the number of rows (n) and columns (p) for the multivariate response matrix. This is a "backup" argument only required when `create.life` can not determine how many rows or columns the multivariate response matrix should be.
`save.params`	If `save.params = TRUE`, then all parameters provided as input and/or generated (e.g. when `traits` and `traits.coefs` are supplied then `X.coefs` is generated internally; please see `traits.coefs` argument above) are returned, in addition to the simulated multivariate response matrix. Defaults to `FALSE`.
`...`	Not used.

create.life gives the user full capacity to control the true parameters of the model from which the multivariate responses matrices are generated from. If true.lv is supplied, then the data generation mechanism is based on this. If true.lv = NULL, then the function looks to lv.control to determine whether and how the true latent variables are to be simulated.

simulate makes use of the generic function of the same name in R: it takes a fitted model, treats either the posterior medians and mean estimates from the model as the true parameters, and generates response matrices based off that.

If create.life is used, then: 1) if save.params = FALSE, a n by p multivariate response matrix is returned only, 2) if save.params = TRUE, then a list containing the element resp which is a n times p multivariate response matrix, as well as other elements for the parameters used in the true model, e.g. true.lv, lv.coefs = lv.coefs, traits.coef, is returned.

If simulate is used, then a three dimensional array of dimension n by p by nsim is returned. The same latent variables can be used each time (new.lvs = FALSE), or new true latent variables can be generated each time (new.lvs = TRUE).

Maintainer: NA

boral for the default function for model fitting.

## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

## Example 1a - Simulate a response matrix of normally distributed data
library(mvtnorm)

## 30 rows (sites) with two latent variables 
true.lv <- rbind(rmvnorm(n=15,mean=c(1,2)),rmvnorm(n=15,mean=c(-3,-1))) 
## 30 columns (species)
lv.coefs <- cbind(matrix(runif(30*3),30,3),1)

X <- matrix(rnorm(30*4),30,4) 
## 4 explanatory variables
X.coefs <- matrix(rnorm(30*4),30,4)

simy <- create.life(true.lv = true.lv, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, family = "normal")

## Not run: 
fit.boral <- boral(simy, X = X, family = "normal", lv.control = list(num.lv = 2),
    mcmc.control = example_mcmc_control)

summary(fit.boral)

## End(Not run)

## Example 1b - Include a nested random row effect
## 30 subregions nested within six regions
example_row_ids <- cbind(1:30, rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true.row.sigma <- list(ID1 = 0.5, ID2 = 2)

simy <- create.life(true.lv = true.lv, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

	
## Example 1c - Same as example 1b except new, true latent variables are generated
simy <- create.life(true.lv = NULL, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

    
## Example 1d - Same as example 1a except new, true latent variables are generated
## with a non-independent correlation structure using a fake distance matrix
makedistmat <- as.matrix(dist(1:30))
simy <- create.life(true.lv = NULL, lv.coefs = lv.coefs, 
    lv.control = list(num.lv = 2, type = "exponential", lv.covparams = 5, distmat = makedistmat),
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

    
## Example 2 - Simulate a response matrix of ordinal data
## 30 rows (sites) with two latent variables 
true.lv <- rbind(rmvnorm(15,mean=c(-2,-2)),rmvnorm(15,mean=c(2,2)))
## 10 columns (species)
true.lv.coefs <- rmvnorm(10,mean = rep(0,3)); 
## Cutoffs for proportional odds regression (must be in increasing order)
true.ordinal.cutoffs <- seq(-2,10,length=10-1)

simy <- create.life(true.lv = true.lv, lv.coefs = true.lv.coefs, 
     family = "ordinal", cutoffs = true.ordinal.cutoffs, save.params = TRUE) 

## Not run: 
fit.boral <- boral(y = simy$resp, family = "ordinal", lv.control = list(num.lv = 2),
      mcmc.control = example_mcmc_control)

## End(Not run)

## Not run: 
## Example 3 - Simulate a response matrix of count data based off
## a fitted model involving traits (ants data from mvabund)
library(mvabund)
data(antTraits)

y <- antTraits$abun
X <- as.matrix(antTraits$env)
## Include only traits 1, 2, and 5, plus an intercept
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
## Please see help file for boral regarding the use of which.traits
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits)) 
     example_which_traits[[i]] <- 1:ncol(traits)

fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits, 
    family = "negative.binomial", lv.control = list(num.lv = 2),
     mcmc.control = example_mcmc_control)

## The hard way
simy <- create.life(true.lv = fit_traits$lv.mean, 
     lv.coefs = fit_traits$lv.coefs.median, X = X, 
     X.coefs = fit_traits$X.coefs.median, traits = traits, 
     traits.coefs = fit_traits$traits.coefs.median, family = "negative.binomial")

## The easy way, using the same latent variables as the fitted model
simy <- simulate(object = fit_traits)

## The easy way, generating new latent variables
simy <- simulate(object = fit_traits, new.lvs = TRUE)

## End(Not run)


## Example 4 - simulate Bernoulli data, based on a model with two latent variables, 
## no site variables, with two traits and one environmental covariates 
## This example is a proof of concept that traits can used 
## to explain environmental responses 
library(mvtnorm)

n <- 100; s <- 50
X <- as.matrix(scale(1:n))
colnames(X) <- c("elevation")

traits <- cbind(rbinom(s,1,0.5), rnorm(s)) 
## one categorical and one continuous variable
colnames(traits) <- c("thorns-dummy","SLA")

simfit <- list(lv.coefs = cbind(rnorm(s), rmvnorm(s, mean = rep(0,2))), 
     lv.control = list(num.lv = 2, type = "independent"),
    traits.coefs = matrix(c(0.1,1,-0.5,1,0.5,0,-1,1), 2, byrow = TRUE))
rownames(simfit$traits.coefs) <- c("beta0","elevation")
colnames(simfit$traits.coefs) <- c("kappa0","thorns-dummy","SLA","sigma")

simy <- create.life(lv.control = simfit$lv.control, lv.coefs = simfit$lv.coefs, 
    X = X, traits = traits, traits.coefs = simfit$traits.coefs, 
    family = "binomial") 


## Not run: 
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
    n.thin = 1)

example_which_traits <- vector("list",ncol(X)+1); 
for(i in 1:length(example_which_traits)) 
    example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y = simy, X = X, traits = traits, 
    which.traits = example_which_traits, family = "binomial", 
    lv.control = list(num.lv = 2), save.model = TRUE, 
    mcmc.control = example_mcmc_control)

## End(Not run)