View source: R/simdatafunctions.R
create.life | R Documentation |
Simulate a multivariate response matrix, given parameters such as but not necessarily all of: family, number of latent variables and related coefficients, an matrix of explanatory variables and related coefficients, row effects, response-specific random intercepts, cutoffs for cumulative probit regression of ordinal responses, and so on.
create.life(true.lv = NULL, lv.coefs,
lv.control = list(num.lv = 0, type = "independent",
lv.covparams = NULL, distmat = NULL),
X = NULL, X.coefs = NULL,
traits = NULL, traits.coefs = NULL, family,
row.eff = "none", row.params = NULL, row.ids = NULL,
true.ranef = NULL, ranef.params = NULL, ranef.ids = NULL,
offset = NULL, trial.size = 1, cutoffs = NULL, powerparam = NULL,
manual.dim = NULL, save.params = FALSE)
## S3 method for class 'boral'
simulate(object, nsim = 1, seed = NULL, new.lvs = FALSE, new.ranefs = FALSE,
distmat = NULL, est = "median", ...)
object |
An object of class "boral". |
nsim |
Number of multivariate response matrices to simulate. Defaults to 1. |
seed |
Seed for dataset simulation. Defaults to |
new.lvs |
If |
new.ranefs |
If |
distmat |
A distance matrix required to calculate correlations across sites when a non-independent correlation structure on the latent variables is imposed, when |
est |
A choice of either the posterior median ( |
true.lv |
A matrix of true latent variables. With multivariate abundance data in ecology for instance, each row corresponds to the true site ordination coordinates. If supplied, then simulation is based of these true latent variables. If |
lv.coefs |
A matrix containing response-specific intercepts, latent variable coefficients relating to |
lv.control |
This argument is utilized if
Please see |
X |
A model matrix of covariates (otherwise known as the covariate matrix), which can be included as part of the data generation. Defaults to |
X.coefs |
The coefficients relating to the covariate matrix. Defaults to |
traits |
A model matrix of species traits (otherwise known as the covariate matrix), which can be included as part of the model. Defaults to |
traits.coefs |
A matrix of coefficients that are used to generate "new" response-specific intercepts and How this argument works is as follows: when both It is important that highlight then with in this data generation mechanism, the new response-specific intercepts and Defaults to |
family |
Either a single element, or a vector of length equal to the number of columns in the response matrix. The former assumes all columns of the response matrix come from this distribution. The latter option allows for different distributions for each column of the response matrix. Elements can be one of "binomial" (with probit link), "poisson" (with log link), "negative.binomial" (with log link), "normal" (with identity link), "lnormal" for lognormal (with log link), "tweedie" (with log link), "exponential" (with log link), "gamma" (with log link), "beta" (with logit link), "ordinal" (cumulative probit regression), "ztpoisson" (zero truncated Poisson with log link), "ztnegative.binomial" (zero truncated negative binomial with log link). Please see |
row.eff |
Single element indicating whether row effects are included as fixed effects ("fixed"), random effects ("random") or not included ("none") in the fitted model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and standard deviation given by |
row.params |
Parameters corresponding to the row effects. If |
row.ids |
A matrix with the number of rows equal to the number of rows in the response matrix, and the number of columns equal to the number of row effects to be included in the model. Element |
true.ranef |
A list of true response-specific random intercepts. If supplied, it should be a list of length |
ranef.params |
Parameters corresponding to standard deviation for the the response-specific random intercepts distribution. If supplied, it should be a matrix, where the number of rows is equal to the number of responses, and the number of columns equal to |
ranef.ids |
A matrix with the number of rows equal to the number of rows in the response matrix, and the number of columns equal to the number of random intercepts to be included in the model. Element |
offset |
A matrix with the same dimensions as the response matrix, specifying an a-priori known component to be included in the linear predictor during fitting. Defaults to |
trial.size |
Either equal to a single element, or a vector of length equal to the number of columns in y. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowed in each column of y. The argument is ignored for all columns not assumed to be binomially distributed. Defaults to 1, i.e. Bernoulli distribution. |
cutoffs |
A vector of common common cutoffs for proportional odds regression when any of |
powerparam |
A common power parameter for tweedie regression when any of |
manual.dim |
A vector of length 2, containing the number of rows ( |
save.params |
If |
... |
Not used. |
create.life
gives the user flexibility to control the true parameters of the model from which the multivariate responses matrices are generated from. For example, if true.lv
is supplied, then the data generation mechanism is based on this set of true latent variables. If true.lv = NULL
, then the function looks to lv.control
to determine whether and how the true latent variables are to be simulated.
simulate
makes use of the generic function of the same name in R
: it takes a fitted model, treats either the posterior medians and mean estimates from the model as the true parameters, and generates response matrices based off that. There is control as to whether the latent variables and/or response-specific random intercepts obtained from the fitted model are used, or new ones are generated.
If create.life
is used, then: 1) if save.params
= FALSE, a n
by p
multivariate response matrix is returned only, 2) if save.params = TRUE
, then a list containing the element resp
which is a n
times p
multivariate response matrix, as well as other elements for the parameters used in the true model are returned.
If simulate
is used, then a three dimensional array of dimension n
by p
by nsim
is returned. The same latent variables can be used each time (new.lvs = FALSE
), or new true latent variables can be generated each time (new.lvs = TRUE
).
Francis K.C. Hui [aut, cre], Wade Blanchard [aut]
Maintainer: Francis K.C. Hui <fhui28@gmail.com>
boral
for the default function for model fitting.
## NOTE: The values below MUST NOT be used in a real application;
## they are only used here to make the examples run quick!!!
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100,
n.thin = 1)
testpath <- file.path(tempdir(), "jagsboralmodel.txt")
## Example 1a - Simulate a response matrix of normally distributed data
library(mvtnorm)
## 30 rows (sites) with two latent variables
true_lv <- rbind(rmvnorm(n=15,mean=c(1,2)),rmvnorm(n=15,mean=c(-3,-1)))
## 30 columns (species)
true_lv_coefs <- cbind(matrix(runif(30*3),30,3),1)
X <- matrix(rnorm(30*4),30,4)
## 4 explanatory variables
X.coefs <- matrix(rnorm(30*4),30,4)
simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs,
X = X, X.coefs = X.coefs, family = "normal")
## Not run:
fit_simdata <- boral(simy, X = X, family = "normal", lv.control = list(num.lv = 2),
mcmc.control = example_mcmc_control, model.name = testpath)
summary(fit_simdata)
## End(Not run)
## Example 1b - Include a nested random row effect
## 30 subregions nested within six regions
example_row_ids <- cbind(1:30, rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true_ranef_sigma <- list(ID1 = 0.5, ID2 = 2)
simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs,
X = X, X.coefs = X.coefs, row.eff = "random",
row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
save.params = TRUE)
## Example 1c - Same as example 1b except new, true latent variables are generated
simy <- create.life(true.lv = NULL, lv.coefs = true_lv_coefs,
X = X, X.coefs = X.coefs, row.eff = "random",
row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
save.params = TRUE)
## Example 1d - Same as example 1a except new, true latent variables are generated
## with a non-independent correlation structure using a fake distance matrix
makedistmat <- as.matrix(dist(1:30))
simy <- create.life(true.lv = NULL, lv.coefs = true_lv_coefs,
lv.control = list(num.lv = 2, type = "exponential", lv.covparams = 5,
distmat = makedistmat),
X = X, X.coefs = X.coefs, row.eff = "random",
row.params = true_ranef_sigma, row.ids = example_row_ids, family = "normal",
save.params = TRUE)
## Example 1e - Similar to 1b, except instead of a nested random row effect,
## it includes a species-specific random interept at the region level
example_ranef_ids <- data.frame(region = rep(1:6,each=5))
## Subregion has a small std deviation; region has a larger one
true_ranef_sigma <- matrix(runif(nrow(true_lv_coefs)),
nrow = nrow(true_lv_coefs), ncol = 1)
simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs,
X = X, X.coefs = X.coefs, ranef.ids = example_ranef_ids,
ranef.params = true_ranef_sigma, family = "normal",
save.params = TRUE)
## Example 2 - Simulate a response matrix of ordinal data
## 30 rows (sites) with two latent variables
true_lv <- rbind(rmvnorm(15,mean=c(-2,-2)),rmvnorm(15,mean=c(2,2)))
## 10 columns (species)
true_lv_coefs <- rmvnorm(10,mean = rep(0,3));
## Cutoffs for proportional odds regression (must be in increasing order)
true.ordinal.cutoffs <- seq(-2,10,length=10-1)
simy <- create.life(true.lv = true_lv, lv.coefs = true_lv_coefs,
family = "ordinal", cutoffs = true.ordinal.cutoffs, save.params = TRUE)
## Not run:
fit_simdata <- boral(y = simy$resp, family = "ordinal", lv.control = list(num.lv = 2),
mcmc.control = example_mcmc_control, model.name = testpath)
## End(Not run)
## Not run:
## Example 3 - Simulate a response matrix of count data based off
## a fitted model involving traits (ants data from mvabund)
library(mvabund)
data(antTraits)
y <- antTraits$abun
X <- as.matrix(antTraits$env)
## Include only traits 1, 2, and 5, plus an intercept
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
## Please see help file for boral regarding the use of which.traits
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits))
example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits,
family = "negative.binomial", lv.control = list(num.lv = 2),
mcmc.control = example_mcmc_control, model.name = testpath)
## The hard way
simy <- create.life(true.lv = fit_traits$lv.mean,
lv.coefs = fit_traits$lv.coefs.median, X = X,
X.coefs = fit_traits$X.coefs.median, traits = traits,
traits.coefs = fit_traits$traits.coefs.median, family = "negative.binomial")
## The easy way, using the same latent variables as the fitted model
simy <- simulate(object = fit_traits)
## The easy way, generating new latent variables
simy <- simulate(object = fit_traits, new.lvs = TRUE)
## Example 4 - simulate Bernoulli data, based on a model with two latent variables,
## no site variables, with two traits and one environmental covariates
## This example is a proof of concept that traits can used
## to explain environmental responses
library(mvtnorm)
n <- 100; s <- 50
X <- as.matrix(scale(1:n))
colnames(X) <- c("elevation")
traits <- cbind(rbinom(s,1,0.5), rnorm(s))
## one categorical and one continuous variable
colnames(traits) <- c("thorns-dummy","SLA")
simfit <- list(lv.coefs = cbind(rnorm(s), rmvnorm(s, mean = rep(0,2))),
lv.control = list(num.lv = 2, type = "independent"),
traits.coefs = matrix(c(0.1,1,-0.5,1,0.5,0,-1,1), 2, byrow = TRUE))
rownames(simfit$traits.coefs) <- c("beta0","elevation")
colnames(simfit$traits.coefs) <- c("kappa0","thorns-dummy","SLA","sigma")
simy <- create.life(lv.control = simfit$lv.control, lv.coefs = simfit$lv.coefs,
X = X, traits = traits, traits.coefs = simfit$traits.coefs,
family = "binomial")
example_which_traits <- vector("list",ncol(X)+1);
for(i in 1:length(example_which_traits))
example_which_traits[[i]] <- 1:ncol(traits)
fit_traits <- boral(y = simy, X = X, traits = traits,
which.traits = example_which_traits, family = "binomial",
lv.control = list(num.lv = 2), save.model = TRUE,
mcmc.control = example_mcmc_control, model.name = testpath)
## Example 5 -- extend Example 2e in the main boral help file to simulate data from a
## fitted model
library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
X <- scale(spider$x)
spiderfit_nb <- boral(y, X = X, family = "negative.binomial",
ranef.ids = data.frame(region = rep(1:7,each=4)),
mcmc.control = example_mcmc_control, model.name = testpath)
simulate(object = spiderfit_nb)
simulate(object = spiderfit_nb, new.ranefs = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.