create.life: \Sexpr[results=rd, stage=render]{lifecycle::badge("stable")}...

View source: R/simdatafunctions.R

create.lifeR Documentation

\Sexpr[results=rd, stage=render]{lifecycle::badge("stable")} Simulate a Multivariate response matrix

Description

Simulate a multivariate response matrix, given parameters such as but not necessarily all of: family, number of latent variables and related coefficients, an matrix of explanatory variables and related coefficients, row effects, response-specific random intercepts, cutoffs for cumulative probit regression of ordinal responses, and so on.

Usage

 
create.life(true.lv = NULL, lv.coefs, 
     lv.control = list(num.lv = 0, type = "independent", 
     lv.covparams = NULL, distmat = NULL),
     X = NULL, X.coefs = NULL, 
     traits = NULL, traits.coefs = NULL, family, 
     row.eff = "none", row.params = NULL, row.ids = NULL, 
     true.ranef = NULL, ranef.params = NULL, ranef.ids = NULL, 
     offset = NULL, trial.size = 1, cutoffs = NULL, powerparam = NULL, 
     manual.dim = NULL, save.params = FALSE)

     
## S3 method for class 'boral'
simulate(object, nsim = 1, seed = NULL, new.lvs = FALSE, new.ranefs = FALSE, 
     distmat = NULL, est = "median", ...)   
 

Arguments

object

An object of class "boral".

nsim

Number of multivariate response matrices to simulate. Defaults to 1.

seed

Seed for dataset simulation. Defaults to NULL, in which case no seed is set.

new.lvs

If FALSE, then true latent variables are obtained from object. If TRUE, then new true latent variables are generated.

new.ranefs

If FALSE, then true response-specific random intercepts are obtained from object. If TRUE, then new random intercepts are generated.

distmat

A distance matrix required to calculate correlations across sites when a non-independent correlation structure on the latent variables is imposed, when new.lvs = TRUE and object$lv.type != "independent".

est

A choice of either the posterior median (est = "median") or posterior mean (est = "mean"), which are then treated as estimates and the fitted values are calculated from. Default is posterior median.

true.lv

A matrix of true latent variables. With multivariate abundance data in ecology for instance, each row corresponds to the true site ordination coordinates. If supplied, then simulation is based of these true latent variables. If NULL, then the function looks to the argument lv.control to see what to do. Defaults to NULL.

lv.coefs

A matrix containing response-specific intercepts, latent variable coefficients relating to true.lv, and dispersion parameters.

lv.control

This argument is utilized if true.lv = NULL, in which case the function uses this argument to determine how to simulate new, true latent variables. A list (currently) with the following arguments:

  • num.lv: which specifies the number of true latent variables to generate. Defaults to 0.

  • type: which specifies the type the correlation structure of the latent variables (across sites). Defaults to independence correlation structure.

  • lv.covparams: which is a vector containing one or two elements required if parameterizing a non-independence spatial correlation structure of the latent variables.

  • distmat: which a distance matrix required to calculate correlations across sites when a non-independence correlation structure on the latent variables is imposed.

Please see about.lvs for more information.

X

A model matrix of covariates (otherwise known as the covariate matrix), which can be included as part of the data generation. Defaults to NULL, in which case no model matrix is used. No intercept column should be included.

X.coefs

The coefficients relating to the covariate matrix. Defaults to NULL. This argument needs to be supplied if the covariate matrix is supplied and no trait matrix is supplied.

traits

A model matrix of species traits (otherwise known as the covariate matrix), which can be included as part of the model. Defaults to NULL, in which case no matrix was used. No intercept column should be included in the trait matrix, as it is included automatically.

traits.coefs

A matrix of coefficients that are used to generate "new" response-specific intercepts and X.coefs. The number of rows should equal to (ncol(X)+1) and the number of columns should equal to (ncol(traits)+2).

How this argument works is as follows: when both traits and traits.coefs are supplied, then new response-specific intercepts (i.e. the first column of lv.coefs is overwritten) are generated by simulating from a normal distribution with mean equal to
crossprod(c(1,traits), traits.coefs[1,1:(ncol(traits.coefs)-1)]) and standard deviation
traits.coefs[1,ncol(traits.coefs)]. In other words, the last column of trait.coefs provides the standard deviation of the normal distribution, with the other columns being the regression coefficients in the mean of the normal distribution. Analogously, new X.coefs are generated in the same manner using the remaining rows of trait.coefs. Please see about.traits for more information.

It is important that highlight then with in this data generation mechanism, the new response-specific intercepts and X.coefs are now random effects, being drawn from a normal distribution.

Defaults to NULL, in conjuction with traits = NULL.

family

Either a single element, or a vector of length equal to the number of columns in the response matrix. The former assumes all columns of the response matrix come from this distribution. The latter option allows for different distributions for each column of the response matrix. Elements can be one of "binomial" (with probit link), "poisson" (with log link), "negative.binomial" (with log link), "normal" (with identity link), "lnormal" for lognormal (with log link), "tweedie" (with log link), "exponential" (with log link), "gamma" (with log link), "beta" (with logit link), "ordinal" (cumulative probit regression), "ztpoisson" (zero truncated Poisson with log link), "ztnegative.binomial" (zero truncated negative binomial with log link).

Please see about.distributions for information on distributions available in boral overall.

row.eff

Single element indicating whether row effects are included as fixed effects ("fixed"), random effects ("random") or not included ("none") in the fitted model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and standard deviation given by row.params. Defaults to "none".

row.params

Parameters corresponding to the row effects. If
row.eff = "fixed", then these are the fixed effects and should have length equal to the number of columns in the response matrix. If row.eff = "random", then this is the standard deviation for the random effects normal distribution.
If row.eff = "none", then this argument is ignored.

row.ids

A matrix with the number of rows equal to the number of rows in the response matrix, and the number of columns equal to the number of row effects to be included in the model. Element (i,j) indicates the cluster ID of row i in the response matrix for random effect j; please see boral for details. Defaults to NULL, so that if row.params = NULL then the argument is ignored, otherwise if row.params is supplied then
row.ids = matrix(1:nrow(y), ncol = 1) i.e., a single, row effect unique to each row.

true.ranef

A list of true response-specific random intercepts. If supplied, it should be a list of length length(ranef.ids), where the k-th element is the matrix a matrix where the number of rows is equal to the number of responses, and the number of columns is equal to length(unique(ranef.ids[,k])). If NULL, then the function looks to the arguments ranef.ids and/or ranef.params to see what to do. Defaults to NULL.

ranef.params

Parameters corresponding to standard deviation for the the response-specific random intercepts distribution. If supplied, it should be a matrix, where the number of rows is equal to the number of responses, and the number of columns equal to length(ranef.ids). If NULL, then the function looks to the arguments ranef.ids and/or true.ranef to see what to do. Defaults to NULL.

ranef.ids

A matrix with the number of rows equal to the number of rows in the response matrix, and the number of columns equal to the number of random intercepts to be included in the model. Element (i,j) indicates the cluster ID of row i in the response matrix for random intercept j; please see about.ranefs for details. Defaults to NULL, in which case it is assumed no random intercepts are to be included in the model. If supplied, then either one of true.ranef or ranef.params must also be supplied. If true.ranef is supplied, then these are used as the true random intercepts; if ranef.params is supplied, then response-specific random intercepts are generated from a normal distribution with mean zero and (response-specific) standard deviation based on the elements of ranef.params.

offset

A matrix with the same dimensions as the response matrix, specifying an a-priori known component to be included in the linear predictor during fitting. Defaults to NULL.

trial.size

Either equal to a single element, or a vector of length equal to the number of columns in y. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowed in each column of y. The argument is ignored for all columns not assumed to be binomially distributed. Defaults to 1, i.e. Bernoulli distribution.

cutoffs

A vector of common common cutoffs for proportional odds regression when any of family is ordinal. They should be increasing order. Defaults to NULL.

powerparam

A common power parameter for tweedie regression when any of family is tweedie. Defaults to NULL.

manual.dim

A vector of length 2, containing the number of rows (n) and columns (p) for the multivariate response matrix. This is a "backup" argument only required when create.life can not determine how many rows or columns the multivariate response matrix should be.

save.params

If save.params = TRUE, then all parameters provided as input and/or generated are returned, in addition to the simulated multivariate response matrix. Defaults to FALSE.

...

Not used.

Details

create.life gives the user flexibility to control the true parameters of the model from which the multivariate responses matrices are generated from. For example, if true.lv is supplied, then the data generation mechanism is based on this set of true latent variables. If true.lv = NULL, then the function looks to lv.control to determine whether and how the true latent variables are to be simulated.

simulate makes use of the generic function of the same name in R: it takes a fitted model, treats either the posterior medians and mean estimates from the model as the true parameters, and generates response matrices based off that. There is control as to whether the latent variables and/or response-specific random intercepts obtained from the fitted model are used, or new ones are generated.

Value

If create.life is used, then: 1) if save.params = FALSE, a n by p multivariate response matrix is returned only, 2) if save.params = TRUE, then a list containing the element resp which is a n times p multivariate response matrix, as well as other elements for the parameters used in the true model are returned.

If simulate is used, then a three dimensional array of dimension n by p by nsim is returned. The same latent variables can be used each time (new.lvs = FALSE), or new true latent variables can be generated each time (new.lvs = TRUE).

Author(s)

Francis K.C. Hui [aut, cre], Wade Blanchard [aut]

Maintainer: Francis K.C. Hui <fhui28@gmail.com>

See Also

boral for the default function for model fitting.

Examples

## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

testpath <- file.path(tempdir(), "jagsboralmodel.txt")


## Example 1a - Simulate a response matrix of normally distributed data
library(mvtnorm)

## 30 rows (sites) with two latent variables 
true_lv <- rbind(rmvnorm(n=15,mean=c(1,2)),rmvnorm(n=15,mean=c(-3,-1))) 
## 30 columns (species)
true_lv_coefs <- cbind(matrix(runif(30*3),30,3),1)

X <- matrix(rnorm(30*4),30,4) 
## 4 explanatory variables
X.coefs <- matrix(rnorm(30*4),30,4)

simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs, 
    X = X, X.coefs = X.coefs, family = "normal")

## Not run: 
fit_simdata <- boral(simy, X = X, family = "normal", lv.control = list(num.lv = 2),
    mcmc.control = example_mcmc_control, model.name = testpath)

summary(fit_simdata)

## End(Not run)

## Example 1b - Include a nested random row effect
## 30 subregions nested within six regions
example_row_ids <- cbind(1:30, rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true_ranef_sigma <- list(ID1 = 0.5, ID2 = 2)

simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)
	
	
## Example 1c - Same as example 1b except new, true latent variables are generated
simy <- create.life(true.lv = NULL, lv.coefs = true_lv_coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

    
## Example 1d - Same as example 1a except new, true latent variables are generated
##   with a non-independent correlation structure using a fake distance matrix
makedistmat <- as.matrix(dist(1:30))
simy <- create.life(true.lv = NULL, lv.coefs = true_lv_coefs, 
    lv.control = list(num.lv = 2, type = "exponential", lv.covparams = 5, 
          distmat = makedistmat),
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)
    
    
## Example 1e - Similar to 1b, except instead of a nested random row effect,
## it includes a species-specific random interept at the region level
example_ranef_ids <- data.frame(region = rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true_ranef_sigma <- matrix(runif(nrow(true_lv_coefs)), 
     nrow = nrow(true_lv_coefs), ncol = 1)

simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs, 
    X = X, X.coefs = X.coefs, ranef.ids = example_ranef_ids,
    ranef.params = true_ranef_sigma, family = "normal",
    save.params = TRUE)

        
## Example 2 - Simulate a response matrix of ordinal data
## 30 rows (sites) with two latent variables 
true_lv <- rbind(rmvnorm(15,mean=c(-2,-2)),rmvnorm(15,mean=c(2,2)))
## 10 columns (species)
true_lv_coefs <- rmvnorm(10,mean = rep(0,3)); 
## Cutoffs for proportional odds regression (must be in increasing order)
true.ordinal.cutoffs <- seq(-2,10,length=10-1)

simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs, 
     family = "ordinal", cutoffs = true.ordinal.cutoffs, save.params = TRUE) 

## Not run: 
fit_simdata <- boral(y = simy$resp, family = "ordinal", lv.control = list(num.lv = 2),
      mcmc.control = example_mcmc_control, model.name = testpath)

## End(Not run)

## Not run: 
## Example 3 - Simulate a response matrix of count data based off
## a fitted model involving traits (ants data from mvabund)
library(mvabund)
data(antTraits)

y <- antTraits$abun
X <- as.matrix(antTraits$env)
## Include only traits 1, 2, and 5, plus an intercept
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
## Please see help file for boral regarding the use of which.traits
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits)) 
     example_which_traits[[i]] <- 1:ncol(traits)

fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits, 
    family = "negative.binomial", lv.control = list(num.lv = 2),
     mcmc.control = example_mcmc_control, model.name = testpath)

## The hard way
simy <- create.life(true.lv = fit_traits$lv.mean, 
     lv.coefs = fit_traits$lv.coefs.median, X = X, 
     X.coefs = fit_traits$X.coefs.median, traits = traits, 
     traits.coefs = fit_traits$traits.coefs.median, family = "negative.binomial")

## The easy way, using the same latent variables as the fitted model
simy <- simulate(object = fit_traits)

## The easy way, generating new latent variables
simy <- simulate(object = fit_traits, new.lvs = TRUE)


## Example 4 - simulate Bernoulli data, based on a model with two latent variables, 
## no site variables, with two traits and one environmental covariates 
## This example is a proof of concept that traits can used 
## to explain environmental responses 
library(mvtnorm)

n <- 100; s <- 50
X <- as.matrix(scale(1:n))
colnames(X) <- c("elevation")

traits <- cbind(rbinom(s,1,0.5), rnorm(s)) 
## one categorical and one continuous variable
colnames(traits) <- c("thorns-dummy","SLA")

simfit <- list(lv.coefs = cbind(rnorm(s), rmvnorm(s, mean = rep(0,2))), 
     lv.control = list(num.lv = 2, type = "independent"),
    traits.coefs = matrix(c(0.1,1,-0.5,1,0.5,0,-1,1), 2, byrow = TRUE))
rownames(simfit$traits.coefs) <- c("beta0","elevation")
colnames(simfit$traits.coefs) <- c("kappa0","thorns-dummy","SLA","sigma")

simy <- create.life(lv.control = simfit$lv.control, lv.coefs = simfit$lv.coefs, 
    X = X, traits = traits, traits.coefs = simfit$traits.coefs, 
    family = "binomial") 


example_which_traits <- vector("list",ncol(X)+1); 
for(i in 1:length(example_which_traits)) 
    example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y = simy, X = X, traits = traits, 
    which.traits = example_which_traits, family = "binomial", 
    lv.control = list(num.lv = 2), save.model = TRUE, 
    mcmc.control = example_mcmc_control, model.name = testpath)


## Example 5 -- extend Example 2e in the main boral help file to simulate data from a 
##   fitted model
library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
X <- scale(spider$x)

spiderfit_nb <- boral(y, X = X, family = "negative.binomial", 
    ranef.ids = data.frame(region = rep(1:7,each=4)), 
    mcmc.control = example_mcmc_control, model.name = testpath) 

simulate(object = spiderfit_nb)

simulate(object = spiderfit_nb, new.ranefs = TRUE)   

## End(Not run)

boral documentation built on May 29, 2024, 12:30 p.m.