create.life: Simulate a Multivariate Response Matrix

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/simdatafunctions.R

Description

Simulate a multivariate response matrix, given parameters such as but not necessarily all of: family, number of latent variables and related coefficients, an matrix of explanatory variables and related coefficients, row effects, cutoffs for cumulative probit regression of ordinal responses.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 
create.life(true.lv = NULL, lv.coefs, 
    lv.control = list(num.lv = 0, type = "independent", 
    lv.covparams = NULL, distmat = NULL),
     X = NULL, X.coefs = NULL, 
     traits = NULL, traits.coefs = NULL, family, row.eff = "none", 
     row.params = NULL, row.ids = NULL, offset = NULL, 
     trial.size = 1, cutoffs = NULL, powerparam = NULL, 
     manual.dim = NULL, save.params = FALSE)

## S3 method for class 'boral'
simulate(object, nsim = 1, seed = NULL, new.lvs = FALSE, distmat = NULL, 
     est = "median", ...)   
 

Arguments

object

An object of class "boral".

nsim

Number of multivariate response matrices to simulate. Defaults to 1.

seed

Seed for dataset simulation. Defaults to NULL, in which case no seed is set.

new.lvs

If FALSE, then true latent variables are obtained from the object. If TRUE, then new true latent variables are generated.

distmat

A distance matrix required to calculate correlations across sites when a non-independent correlation structure on the latent variables is imposed, when new.lvs = TRUE and object$lv.type != "independent".

est

A choice of either the posterior median (est == "median") or posterior mean (est == "mean"), which are then treated as estimates and the fitted values are calculated from. Default is posterior median.

true.lv

A matrix of true latent variables. With multivariate abundance data in ecology for instance, each row corresponds to the true site ordination coordinates. If supplied, then simulation is based of these true latent variables. If NULL, then the function looks to the argument lv.control to see what to do. Defaults to NULL.

lv.coefs

A matrix containing column-specific intercepts, latent variable coefficients relating to true.lv, and dispersion parameters.

lv.control

This argument is utilized if true.lv = NULL, in which case the function uses this argument to determine how to simulate new, true latent variables. A list (currently) with the following arguments:

  • num.lv: which specifies the number of true latent variables to generate. Defaults to 0.

  • type: which specifies the type the correlation structure of the latent variables (across sites). Defaults to independence correlation structure.

  • lv.covparams: which is a vector containing one or two elements required if parameterizing a non-independence spatial correlation structure of the latent variables.

  • distmat: which a distance matrix required to calculate correlations across sites when a non-independence correlation structure on the latent variables is imposed.

Please see about.lvs for more information.

X

An model matrix of covariates, which can be included as part of the data generation. Defaults to NULL, in which case no model matrix is used. No intercept column should be included in X.

X.coefs

The coefficients relating to X. Defaults to NULL. This argument needs to be supplied if X is supplied and no traits are supplied.

traits

A model matrix of species traits, which can be included as part of the model. Defaults to NULL, in which case no matrix was used. No intercept column should be included in traits, as it is included automatically.

traits.coefs

A matrix of coefficients that are used to generate "new" column-specific intercepts and X.coefs. The number of rows should equal to (ncol(X)+1) and the number of columns should equal to (ncol(traits)+2).

How this argument works is as follows: when both traits and traits.coefs are supplied, then new column-specific intercepts (i.e. the first column of lv.coefs is overwritten) are generated by simulating from a normal distribution with mean equal to
crossprod(c(1,traits), traits.coefs[1,1:(ncol(traits.coefs)-1)]) and standard deviation
traits.coefs[1,ncol(traits.coefs)]. In other words, the last column of trait.coefs provides the standard deviation of the normal distribution, with the other columns being the regression coefficients in the mean of the normal distribution. Analogously, new X.coefs are generated in the same manner using the remaining rows of trait.coefs. Please see about.traits for more information.

It is important that highlight then with in this data generation mechanism, the new column-specific intercepts and X.coefs are now random effects, being drawn from a normal distribution.

Defaults to NULL, in conjuction with traits = NULL.

family

Either a single element, or a vector of length equal to the number of columns in y. The former assumes all columns of y come from this distribution. The latter option allows for different distributions for each column of y. Elements can be one of "binomial" (with probit link), "poisson" (with log link), "negative.binomial" (with log link), "normal" (with identity link), "lnormal" for lognormal (with log link), "tweedie" (with log link), "exponential" (with log link), "gamma" (with log link), "beta" (with logit link), "ordinal" (cumulative probit regression).

Please see about.distributions for information on distributions available in boral overall.

row.eff

Single element indicating whether row effects are included as fixed effects ("fixed"), random effects ("random") or not included ("none") in the fitted model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and standard deviation given by row.params. Defaults to "none".

row.params

Parameters corresponding to the row effect from the fitted model. If
row.eff = "fixed", then these are the fixed effects and should have length equal to the number of columns in y. If row.eff = "random", then this is the standard deviation for the random effects normal distribution.
If row.eff = "none", then this argument is ignored.

row.ids

A matrix with the number of rows equal to the number of rows in y, and the number of columns equal to the number of row effects to be included in the model. Element (i,j) indicates to the cluster ID of row i in y for random effect eqnj; please see boral for details. Defaults to NULL, so that if row.params = NULL then the argument is ignored, otherwise if row.params is supplied then
row.ids = matrix(1:nrow(y), ncol = 1) i.e., a single, row effect unique to each row. An internal check is done to see row.params and row.ids are consistent in terms of arguments supplied.

offset

A matrix with the same dimensions as the response matrix y, specifying an a-priori known component to be included in the linear predictor during fitting. Defaults to NULL.

trial.size

Either equal to a single element, or a vector of length equal to the number of columns in y. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowed in each column of y. The argument is ignored for all columns not assumed to be binomially distributed. Defaults to 1, i.e. Bernoulli distribution.

cutoffs

A vector of common common cutoffs for proportional odds regression when any of family is ordinal. They should be increasing order. Defaults to NULL.

powerparam

A common power parameter for tweedie regression when any of family is tweedie. Defaults to NULL.

manual.dim

A vector of length 2, containing the number of rows (n) and columns (p) for the multivariate response matrix. This is a "backup" argument only required when create.life can not determine how many rows or columns the multivariate response matrix should be.

save.params

If save.params = TRUE, then all parameters provided as input and/or generated (e.g. when traits and traits.coefs are supplied then X.coefs is generated internally; please see traits.coefs argument above) are returned, in addition to the simulated multivariate response matrix. Defaults to FALSE.

...

Not used.

Details

create.life gives the user full capacity to control the true parameters of the model from which the multivariate responses matrices are generated from. If true.lv is supplied, then the data generation mechanism is based on this. If true.lv = NULL, then the function looks to lv.control to determine whether and how the true latent variables are to be simulated.

simulate makes use of the generic function of the same name in R: it takes a fitted model, treats either the posterior medians and mean estimates from the model as the true parameters, and generates response matrices based off that.

Value

If create.life is used, then: 1) if save.params = FALSE, a n by p multivariate response matrix is returned only, 2) if save.params = TRUE, then a list containing the element resp which is a n times p multivariate response matrix, as well as other elements for the parameters used in the true model, e.g. true.lv, lv.coefs = lv.coefs, traits.coef, is returned.

If simulate is used, then a three dimensional array of dimension n by p by nsim is returned. The same latent variables can be used each time (new.lvs = FALSE), or new true latent variables can be generated each time (new.lvs = TRUE).

Author(s)

NA

Maintainer: NA

See Also

boral for the default function for model fitting.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

## Example 1a - Simulate a response matrix of normally distributed data
library(mvtnorm)

## 30 rows (sites) with two latent variables 
true.lv <- rbind(rmvnorm(n=15,mean=c(1,2)),rmvnorm(n=15,mean=c(-3,-1))) 
## 30 columns (species)
lv.coefs <- cbind(matrix(runif(30*3),30,3),1)

X <- matrix(rnorm(30*4),30,4) 
## 4 explanatory variables
X.coefs <- matrix(rnorm(30*4),30,4)

simy <- create.life(true.lv = true.lv, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, family = "normal")

## Not run: 
fit.boral <- boral(simy, X = X, family = "normal", lv.control = list(num.lv = 2),
    mcmc.control = example_mcmc_control)

summary(fit.boral)

## End(Not run)

## Example 1b - Include a nested random row effect
## 30 subregions nested within six regions
example_row_ids <- cbind(1:30, rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true.row.sigma <- list(ID1 = 0.5, ID2 = 2)

simy <- create.life(true.lv = true.lv, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

	
## Example 1c - Same as example 1b except new, true latent variables are generated
simy <- create.life(true.lv = NULL, lv.coefs = lv.coefs, 
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

    
## Example 1d - Same as example 1a except new, true latent variables are generated
## with a non-independent correlation structure using a fake distance matrix
makedistmat <- as.matrix(dist(1:30))
simy <- create.life(true.lv = NULL, lv.coefs = lv.coefs, 
    lv.control = list(num.lv = 2, type = "exponential", lv.covparams = 5, distmat = makedistmat),
    X = X, X.coefs = X.coefs, row.eff = "random",
    row.params = true.row.sigma, row.ids = example_row_ids, family = "normal",
    save.params = TRUE)

    
## Example 2 - Simulate a response matrix of ordinal data
## 30 rows (sites) with two latent variables 
true.lv <- rbind(rmvnorm(15,mean=c(-2,-2)),rmvnorm(15,mean=c(2,2)))
## 10 columns (species)
true.lv.coefs <- rmvnorm(10,mean = rep(0,3)); 
## Cutoffs for proportional odds regression (must be in increasing order)
true.ordinal.cutoffs <- seq(-2,10,length=10-1)

simy <- create.life(true.lv = true.lv, lv.coefs = true.lv.coefs, 
     family = "ordinal", cutoffs = true.ordinal.cutoffs, save.params = TRUE) 

## Not run: 
fit.boral <- boral(y = simy$resp, family = "ordinal", lv.control = list(num.lv = 2),
      mcmc.control = example_mcmc_control)

## End(Not run)

## Not run: 
## Example 3 - Simulate a response matrix of count data based off
## a fitted model involving traits (ants data from mvabund)
library(mvabund)
data(antTraits)

y <- antTraits$abun
X <- as.matrix(antTraits$env)
## Include only traits 1, 2, and 5, plus an intercept
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
## Please see help file for boral regarding the use of which.traits
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits)) 
     example_which_traits[[i]] <- 1:ncol(traits)

fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits, 
    family = "negative.binomial", lv.control = list(num.lv = 2),
     mcmc.control = example_mcmc_control)

## The hard way
simy <- create.life(true.lv = fit_traits$lv.mean, 
     lv.coefs = fit_traits$lv.coefs.median, X = X, 
     X.coefs = fit_traits$X.coefs.median, traits = traits, 
     traits.coefs = fit_traits$traits.coefs.median, family = "negative.binomial")

## The easy way, using the same latent variables as the fitted model
simy <- simulate(object = fit_traits)

## The easy way, generating new latent variables
simy <- simulate(object = fit_traits, new.lvs = TRUE)

## End(Not run)


## Example 4 - simulate Bernoulli data, based on a model with two latent variables, 
## no site variables, with two traits and one environmental covariates 
## This example is a proof of concept that traits can used 
## to explain environmental responses 
library(mvtnorm)

n <- 100; s <- 50
X <- as.matrix(scale(1:n))
colnames(X) <- c("elevation")

traits <- cbind(rbinom(s,1,0.5), rnorm(s)) 
## one categorical and one continuous variable
colnames(traits) <- c("thorns-dummy","SLA")

simfit <- list(lv.coefs = cbind(rnorm(s), rmvnorm(s, mean = rep(0,2))), 
     lv.control = list(num.lv = 2, type = "independent"),
    traits.coefs = matrix(c(0.1,1,-0.5,1,0.5,0,-1,1), 2, byrow = TRUE))
rownames(simfit$traits.coefs) <- c("beta0","elevation")
colnames(simfit$traits.coefs) <- c("kappa0","thorns-dummy","SLA","sigma")

simy <- create.life(lv.control = simfit$lv.control, lv.coefs = simfit$lv.coefs, 
    X = X, traits = traits, traits.coefs = simfit$traits.coefs, 
    family = "binomial") 


## Not run: 
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
    n.thin = 1)

example_which_traits <- vector("list",ncol(X)+1); 
for(i in 1:length(example_which_traits)) 
    example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y = simy, X = X, traits = traits, 
    which.traits = example_which_traits, family = "binomial", 
    lv.control = list(num.lv = 2), save.model = TRUE, 
    mcmc.control = example_mcmc_control)

## End(Not run)

emitanaka/boral documentation built on Aug. 12, 2019, 12:35 p.m.